GA

« previous post | next post »

One of my favorite Chinese words is GANGA (pronounce as in "Lady Gaga", but put a nasal at the end of the first syllable).  It is so special and has had such a deep impact upon me since I began learning Chinese half a century ago that, in this post, I shall refer to it simply as "GANGA", in capital letters only, except when discussing its more precise pronunciation, derivation, meaning, and written representation in Chinese characters.  Referring to this unusual word as "GANGA" is meant to emphasize the iconic quality it has for me personally, in the sense that its nature reveals many verities about Sinitic languages and Chinese writing.

Before going further, I am obliged to tell you what GANGA means:  "awkward; embarrassed".  It's obviously a useful word, and its very sounds synesthetically mimic the feelings that it designates.  Of course, English "awkward" is pretty good at doing that too, but with the English word each syllable is a morpheme, so you can analyze it as "back-handed" and "turned toward".

With GANGA, neither of the syllables by itself means anything.  It's one of those disyllabic morphemes, of which there are plenty in Sinitic:  wēiyí 委蛇 / 逶迤 ("meandering; winding"), qílín 麒麟 ("kirin" [not "unicorn"] — I will write a paper or post about its true identity sometime), fènghuáng 鳳凰 (so-called "phoenix"), pípá 琵琶 ("biwa; lute"), pútáo 葡萄 ("grape"),  zhīzhū 蜘蛛 ("spider"), shānhú 珊瑚 ("coral"), qiūyǐn 蚯蚓 ("earthworm"), xīshuài 蟋蟀 ("cricket"), húdié 蝴蝶 ("butterfly" — of mythical proportions, deftly dissected by George Kennedy (1901-1960), the brilliant Yale professor), and so forth.  See "'Butterfly' words as a source of etymological confusion" (1/28/16).  Most of these expressions are ancient and have more than one Sinographic form, and several of them have Iranian antecedents.  All of them provide powerful evidence of the priority of sound over written Sinographic form.  GANGA, as we shall see below, belongs squarely to this salient class of words.

GANGA is a strange creature.  I don't know how I was exposed to such an odd expression during the first year of my study of Mandarin, probably from my wife and in-laws.  They were Shandongese, but had spent a decade in Sichuan during WWII, before moving on to Taiwan where they settled for about thirty years.  As I have described in numerous Language Log posts, when I began the study of Mandarin, I was only interested in the spoken language and really didn't care about the characters.  So I learned how to say GANGA long before I knew how to write it in characters.

When I finally did learn how to write GANGA in characters, I found it all the more fascinating, because the very Sinographic written form of this word appeared bizarre to me.  Like the sound, the written form of GANGA evinces awkwardness.  Here's what it looks like:

尴尬 (simpl.) 尷尬 (trad.) / 尲尬

(The Hanyu Pinyin romanization for Modern Standard Mandarin (MSM) is gāngà.)

First of all, GANGA is written with a rare and ungainly radical, which I shall discuss in greater detail below.  Second, the phonophores used to render the sounds of the two syllables would normally be read as jian and jie (not taking tones into consideration for the moment) in MSM.  Thus both the semantophore and phonophores underscore the weirdness of the written form of this word.

In their Concise Dictionary of Spoken Chinese, Yuen Ren Chao and Lien Sheng Yang say that GANGA is a Wu topolect word.  The Chinese linguist Qian Nairong, who is a researcher and advocate of Shanghainese, also considers GANGA to be a Wu topolect expression.  I'm not surprised at this, because somehow it doesn't seem like a Mandarin expression, but rather like a southernism (pronounced roughly kamkai).  It has always sounded to me like something Shanghainese would say.  Since I don't know of it in Cantonese and Taiwanese, except as a superficial borrowing, and since it has spread fairly widely in Mandarin, it seems possible, or even likely, that it might well have come from Wu.  GANGA may have come into Cantonese when large numbers of Shanghainese people arrived in Hong Kong during the 1940s and 1950s, but it could well have been earlier since there has long been a considerable Cantonese-speaking population in Shanghai.

GANGA is a common expression in Shanghainese, where it means:  1. "awkward; embarrassed", 2. "in straits".  In its modern incarnation, I'm content to accept GANGA as fundamentally a Wu topolect word that has spread across much of the Sinophone world, though not deeply into the other topolects, although it has become extensively assimilated by MSM (the national language).  However, despite its manifestly close association with the spoken realm, GANGA — like the other disyllabic morpheme words cited above — can be securely traced to ancient times.

From Wiktionary:

The original form, 尲尬 (OC *kreːm kreːds), was attested in the ancient dictionary of Shuowen [2nd century CE] and was discernibly an ideophonic, disyllabic word of reduplicative nature (連綿詞).

The term 尲尬 seemed to have fallen into disuse by medieval times, except dialectally, in the Wu region. The dialectal form of it was revived in literature in the form of 尷尬 since the Yuan dynasty, and the colloquial Mandarin pronunciation gāngà was borrowed thence.

Others may disagree with me, but when I see the Old Sinitic reconstruction of a binom like *kreːm kreːds, I cannot help but invoke dimidiation, which in this case would yield something like *kremds.  Perhaps this might suggest cognates in Sino-Tibetan, Tibeto-Burmese, IE, or other language families.

The original meaning of  (now written 尷尬) in Old Chinese was xíng bùzhèng yě 行不正也 ("not walking properly / correctly").  The semantophore 尢 (no. 43 in the Kangxi system, pron. wāng in MSM ["lame"]) for both sinographs of gāngà 尬 refers to "a bow-legged man" (qū jìng rén yě 曲脛人也).  Among the 25,000 or so highest frequency characters, only around 30 are classified under this semantophore, and most of them have something to do with such a handicap.

As for its Sinographic form, 尴尬 (simpl.) 尷尬 (trad.) and 尲尬, there are at least seven other different ways to write GANGA, including three sets of characters that use the "ghost" (guǐ 鬼) radical.  See Hànyǔ dà cídiǎn 漢語大詞典 (Unabridged dictionary of Sinitic), 2.1581b and 12.476ab.

Chau Wu, author of "Patterns of Sound Correspondence between Taiwanese and Germanic/Latin/Greek/Romance Lexicons, Part I", Sino-Platonic Papers, 262 (Aug., 2016), 239 pp. (free pdf) ("Eurasian eureka" [9'12'16), is currently preparing Part II of that study.  In it, he will delve deeply into a pattern of sound correspondences that will explain the trans-Sinitic pronunciations of gāngà 尷尬 and similar binoms.

Enough of philology!  Now I would like to bring GANGA developments up to the present moment.  It so happens that the GA of GANGA has taken on a life of its own and become almost a pop culture meme in the Sinophone world.  See Feng Biyi, "How ‘Ga’ Expresses the Growing Pains of Chinese Youth:  An exploration of the curious coping mechanisms birthed by growing up in a heavily scrutinized society", Sixth Tone (7/30/17).

The article includes a link to a must watch viral video (01:38) of a spontaneous GA eruption in the middle of a busy intersection as police try to arrest a man on a motorcycle.  It may be somewhat difficult to download, and there might be funny things floating across the screen while it plays, but it is definitely worth watching.  BTW, this takes place on Hainan island, the southernmost part of China (from which point the PRC stakes its claim to the whole of the Southeast Asia Sea), but it's difficult to understand more than a few words of what is being said.

How did this meaning of GA come about, and how does it function in contemporary China?

Here's how Yixue Yang puts it:

I love gà 尬 ever since it flew solo from gān 尴, with its crispy and exotic sound. New words made with it, for example gàliáo 尬聊 ("chat awkwardly") and gàxiào 尬笑 ("laugh awkwardly"), exuberate with a strong sense of humor, even though referring to awkwardness of perhaps the severest kind.

The situation in Mainland China is much complicated by the fact that we seem to have two strains of GA interfusing into a single meme.  On the one hand, we have the GA of GANGA directly fissioned off from GANGA as described in the preceding paragraph, where it means "awkward(ly)", as in gàliáo 尬聊 ("chat awkwardly") and gàxiào 尬笑 ("laugh awkwardly"), but also gàchàng 尬唱 ("sing awkwardly") and above all (as we saw in the viral video) gàwǔ 尬舞 ("dance awkwardly").  In the latter two instances, the notion of "competing" also gets worked in, as with middle and older aged folk singing and dancing in public squares or in parks.  How did that happen?

Here it gets complicated, because we're dealing with opaque, cross-topolectal usage of characters.  What I describe in the following paragraphs probably won't make a lot of sense to many people reading it, but I have to go through with it anyway, because it is really how characters are used to write key elements of the topolects (when they are written).

Because the GA of GANGA sounds like Taiwanese kàu / khà / khah / kah / kà 較 ("compete; compare; comparatively; relatively"), 尬 can be used to write 較, so the meanings of 尬 detached from gāngà 尷尬 ("awkward; embarrassed") are apt to get mixed in with the meanings of 較 ("compete; compare; comparatively; relatively").  Thus we can have awkward, amateurish song and dance competitions designated by 尬 / 較.

[Interpolation:  if you think that writing Mandarin in Chinese characters is hard, you have no idea how difficult it is to write Taiwanese, Cantonese, Shanghainese, etc. in characters.]

The key point I wish to make here is that, years before gàwǔ 尬舞 ("dance awkwardly") became popular on the Mainland, there was an oline craze for dance competition in Taiwan that was styled 尬舞, where 尬 meant "compete" (= 較), not "awkward", which the first Chinese character of the Taiwanese term 尬舞 superficially indicates for non-Taiwanese speakers.  When the craze for amateur dance competitions transferred to the Mainland, the original, non-Taiwanese / Hokkienese meaning of gà 尬 came to the surface, and the idea of "awkward" asserted itself together with the latent idea of competition that came along as cultural Taiwanese / Hokkienese baggage from Taiwan.

In addition to the online game gàwǔ 尬舞 ("dance competition") that was so popular in Taiwan, there was also a song by the Taiwanese band May Day called yà / zhá / gá chē 軋車 or gàchē 尬車 (using MSM pronunciations in this paragraph) which was extremely popular. The song, which describes the thrill of motorcycle racing and teenage rebellion, also became phenomenally popular on the Mainland (see this Baidu article) and undoubtedly was an instrumental factor in transporting the Taiwanese meaning of "compete" for gà 尬 to China, where it otherwise would have been completely unknown.

In Taiwan, the character gà 尬 is commonly used for competition among the young generation, such as gàchē 尬車 ("racing motorcycles / cars"), gàwǔ 尬舞 ("dance competition"), gàgē 尬歌 ("singing competition"), gàqiú 尬球 ("sports competition involving a ball"), etc. However, in those contexts, gà 尬 has nothing to do with the original meaning in gāngà 尷尬 ("awkward; embarrassed"). Therefore, unlike in China, gàwǔ 尬舞 in Taiwan simply means a fierce dance competition (usually one-on-one), and doesn't carry the meaning of being awkward.

Apart from gà 尬 with the Taiwanese / Hokkienese meaning of "compete", gāngà 尷尬 with its Mandarin meaning of "awkward; embarrassed" became a hot topic in Taiwan recently when it was discovered that most young people didn't know how to pronounce it as gāngà, but were pronouncing it according to the more obvious sounds of the two phonophores, thus jiānjiè. The online dictionary of the Ministry of Education even made this wrong pronunciation official.  Go here and search for 尷尬.

Although gāngà 尷尬 can be read in Taiwanese, it is not used conversationally to express a feeling of awkwardness or discomfiture.  Instead, Taiwanese speakers would say things like chìn-thòe-lióng-lân 進退兩難 ("dilemma; perplexity"), phái-hoat-lō 歹發落 ("difficult to handle / deal with"), etc.

Bottom line:  GA in Taiwan mainly means "compete"; GA on the Mainland signifies "awkward" or "compete awkwardly".

[h.t. Ben Zimmer; thanks to Don Snow, Jinyi Cai, Wenkan Xu, Chau Wu, Melvin Lee, Sophie Ling-chia Wei, David Prager Branner, and Grace Wu]



51 Comments »

  1. Michael Watts said,

    August 6, 2017 @ 1:54 pm

    gāngà 尷尬 with its Mandarin meaning of "awkward; embarrassed" became a hot topic in Taiwan recently when it was discovered that most young people didn't know how to pronounce it as gāngà, but were pronouncing it according to the more obvious sounds of the two phonophores, thus jiānjiè.

    Did they know the spoken word gāngà and fail to connect it with 尷尬, or is this just a case of guessing at the pronunciation of an unfamiliar word?

  2. Victor Mair said,

    August 6, 2017 @ 2:01 pm

    The latter.

  3. Mara K said,

    August 6, 2017 @ 2:30 pm

    Is "ga" by itself related to the American English expression of frustration "gah"?

    Also I'm looking forward to your post on the kirin. I first heard of it as a creature type from Magic: the Gathering's vaguely Central Asian-esque setting Tarkir, and I'm glad it exists independently of the game.

  4. Frank said,

    August 6, 2017 @ 2:42 pm

    Is it perhaps not coincidental that the word 'gagá' (loopy, doddering) occurs in romance languages including French, Spanish and my native Portuguese?

  5. Victor Mair said,

    August 6, 2017 @ 3:24 pm

    I would be extremely interested in learning more about Romance "gagá" ("loopy, doddering") , its etymology, cognates in other language groups, historical time-depth and development, semantic range, and so forth.

    I wonder if that's where Lady Gaga got her name?

  6. Mara K said,

    August 6, 2017 @ 3:27 pm

    The best-known story is that Lady Gaga got her name from the Queen song Radio Gaga. Which probably comes from the English "gaga" (close in meaning to the Romance language version, but with more of a connotation these days of intense interest in/enthusiasm about something), which is probably related to the Romance language version.

  7. Michael Watts said,

    August 6, 2017 @ 4:21 pm

    etymonline says this about the english "gaga":

    gaga (adj.)
    "crazy, silly," 1920, probably from French gaga "senile, foolish," probably imitative of meaningless babbling.

  8. Victor Mair said,

    August 6, 2017 @ 4:33 pm

    Thank you, Mara K and Michael Watts. I've long been interested in the origin of Lady Gaga's name.

    About five years ago when I was on one of my last expeditions to the Tarim Basin, the Chinese driver of our car had a few songs on disks, but he only liked "Poker Face", and he must have played it a hundred times as we crossed the desert for several thousand miles. That was the first time I ever heard the song, and I liked it the first thirty or forty times I listened to it on that trip. But after a while….

    It didn't help that I couldn't understand a lot of what she was saying, though I sort of thought that I could half understand much of it (like Taylor Swift's Starbucks / starcrossed lovers (??). And I'm pretty sure that our driver, who knew only a few words of English, was most attracted by the p p p part.

  9. Mara K said,

    August 6, 2017 @ 5:57 pm

    A bad experience at college marching band camp left me negatively disposed toward Lady Gaga (and toward marching band), but I'm told she set herself up as a role model for teenage misfits, and having been one of those, I don't mind her music anymore. (Can't watch her music videos though. They're just creepy.)

    Related: Someone has written a parody novel called Taylor Swift, Girl Detective: Secrets of the Starbucks Lovers.

  10. David Marjanović said,

    August 6, 2017 @ 7:02 pm

    when I see the Old Sinitic reconstruction of a binom like *kreːm kreːds, I cannot help but invoke dimidiation, which in this case would yield something like *kremds

    I've never heard of such "dimidiation" – reduplication followed by peculiar dissimilation??? – occurring in any language at all.

    Also, whether the Sino-Tibetan language family (or "Transhimalayan" as some call it) really has a "Tibeto-Burman" branch remains unclear; in other words, the position of the Sinitic branch within Sino-Tibetan remains unclear due to the scarcity of currently understood evidence and the conflicts in it.

    Chau Wu, author of "Patterns of Sound Correspondence between Taiwanese and Germanic/Latin/Greek/Romance Lexicons, Part I", Sino-Platonic Papers, 262 (Aug., 2016), 239 pp. (free pdf) ("Eurasian eureka" [9'12'16), is currently preparing Part II of that study. In it, he will delve deeply into a pattern of sound correspondences that will explain the trans-Sinitic pronunciations of gāngà 尷尬 and similar binoms.

    If Part II will be at all similar to Part I, it'll be another heap of indefensible pseudoscience, as I tried to explain in my comment to the "Eurasian eureka" post.

  11. Victor Mair said,

    August 6, 2017 @ 9:07 pm

    "I've never heard of such 'dimidiation' … occurring in any language at all."

    Well, sir, now you've heard of it, though it's not what you imagined it to be: "reduplication followed by peculiar dissimilation???"

    See William Boltz's article on this subject in the new Encyclopedia of Chinese Language and Linguistics.

    As for Sino-Tibetan, Tibeto-Burman, etc., those of us who study Sinitic are painfully aware of the problems of how to classify it. Did someone in this post ("GA") say that "Sino-Tibetan family… really has a 'Tibeto-Burman' branch"? Not I. Please reread the o.p. more carefully and without jumping to conclusions.

    If you think Chau Wu's SPP 262 is nothing but a "heap of indefensible pseudoscience", you'd better look again, because it contains a tremendous amount of valuable information concerning Taiwanese (Hoklo) and its relationships to other Sinitic languages.

    Finally, in the spirit of this blog, please be less judgemental, righteous, and intemperate in the language you use, especially when you are not well informed about the topics under discussion. Try to be less categorical and somewhat more nuanced in your proclamations and denunciations, i.e., try to be more civil.

  12. shubert said,

    August 6, 2017 @ 9:12 pm

    It also means clandestinely, sneaky.

  13. B.Ma said,

    August 7, 2017 @ 12:44 am

    I hear GAMGAI in Cantonese often, but those who have not previously learned the Mandarin pronunciation will invariably say JIANJIE (due to the general rule that Cantonese ga- becomes Mandarin jia-)

  14. Guy_H said,

    August 7, 2017 @ 5:15 am

    My memory is a bit fuzzy but I've always thought jian1 jie4 was an alternate pronunciation of 尷尬 in Mandarin? I feel like I was taught that in school (in Taiwan). I've never heard anyone say it like that in speech though.
    Kind of like la1 ji1/le4 se4 for 垃圾 or pu3 bian4/pu3 pian4 for 普遍.

  15. Bev Rowe said,

    August 7, 2017 @ 8:05 am

    This article has (at least) three words not in the OED: topolect/al, binom and semantophore. (Is that linguistic metainformation.?)

  16. flow said,

    August 7, 2017 @ 8:45 am

    @Guy_H—this. Also not only 垃圾, 尷尬, but also 旮旯兒 galar / 角落 jiaoluo (BTW zdic.net also has 旮旮旯旯儿 gāga-lálár 'all corners': 〈方〉∶房屋、庭院、街道的所有角落及曲折隐蔽之处; ex.:
    找遍了旮旮旯旯儿也没有找到丢失的东西.

    'gaga' is also frequently heard in German with the meaning of 'crazy, imbecile, stupid, unreasonable'. This I'd say is how people in Germany most likely understand 'Lady Gaga' (crazy woman) and that Queen's title, 'Radio Gaga' (Radio Gibberish, i.e. person or media that is source of incoherent, unintelligible, inconsequential utterances).

    Other topic, re David Marjanović "another heap of indefensible pseudoscience"—Is there any accessible source that could convince a hobby linguist with some background in Mandarin and Japanese of the reality of Tibeto-Burman and/or Sino-Tibetan? I mean the people living in present-day East Asia sure have a history of intercultural exchange going back thousands of years, so naturally there are similarities to be found in their languages; does anything along those lines prove Tibetan and Mandarin have the same roots, their speakers common ancestors?

  17. Victor Mair said,

    August 7, 2017 @ 9:22 am

    @Bev Rowe

    On "topolect", see:

    "The American Heritage Dictionary of the English Language, 5th edition" (11/14/12)

    http://languagelog.ldc.upenn.edu/nll/?p=4326

    The other terms are used in Sinological publications.

  18. Mara K said,

    August 7, 2017 @ 10:10 am

    Another thought, inspired by the morning crossword: is English "gaga" related to "agog"?

  19. Chris Button said,

    August 7, 2017 @ 3:14 pm

    @ flow

    A good place to start would be Jim Matisoff's "Handbook of Proto-Tibeto-Burman" which is freely downloadable on the Berkeley website.

    You'll rapidly realize that the biggest problem in the whole Sino-Tibetan/Tibeto-Burman enterprise is the vowels. It's all quite unfortunate really because linguists are on the whole incredibly reluctant to accept the reality that vowels are nothing but a surface phenomenon. I can do no better than to quote the late Søren Egerod in this regard writing about Old Chinese in 1971:

    "… we have thus reached the well-known stage in the reconstruction from Proto-Indo-European: a prevalent tendency to empty the vowels of all features which can possibly be place outside them. In Chinese, we have the extra impetus of rimes for doing this, lacking, of course, for Indo-European."

    The comment about the Chinese rimes is particularly significant, because, as you may be aware, the prevailing reconstructions at the moment involve forcing a more "linguistically acceptable" set of vowels onto the system.

    The good thing about Matisoff's work is he doesn't force the system to meet a preconceived agenda , but instead posits "allofamic" variation so as not to sweep anything under the rug. He has actually just removed "e" and "o" from his Tibeto-Burman system and readily acknowledges that evidence for "i" and "u" is marginal at best.

  20. Victor Mair said,

    August 7, 2017 @ 5:11 pm

    Thanks very much for that, Chris.

    Speaking of James Matisoff and Sino-Tibetan (and Tibeto-Burman), this afternoon, just a couple of hours ago, the mailman plunked down on my front stoop a yuge box. In it is Matisoff's Sino-Tibetan Etymological Dictionary and Thesaurus (STEDT). I won't tell you how much it cost, but I will tell you how much it weighs: 27 lbs 11 oz.

    I'm very much looking forward to diving into it tomorrow!

    http://stedt.berkeley.edu/

  21. Adrian said,

    August 7, 2017 @ 5:36 pm

    My first thought when I saw "Ganga" was India's sacred river. (<Sanskrit "gam" = go, according to Wiktionary)

  22. Jonathan Smith said,

    August 7, 2017 @ 7:44 pm

    And note that ga is a highly marginal Mandarin syllable; MC homophones of 尬 (誡, etc.) all went to jie4 (i.e., underwent palatalization). Thus the suspicion it is a southernism.

  23. Jonathan Smith said,

    August 7, 2017 @ 7:49 pm

    Or maybe the word resisted regular sound change in the north because it is/was sound symbolic

  24. Victor Mair said,

    August 7, 2017 @ 9:51 pm

    From François Demay:

    Voici la notice étymologique du Trésor de la langue française

    GAGA, subst. et adj.

    Étymol. et Hist. 1879 (A. Daudet, Rois en exil, éd. A. Fayart, p. 25 : D'affreux bourgeois qui ne comprennent pas que si la monarchie est condamnée, il vaut mieux qu'elle meure en combattant, roulée dans son drapeau, plutôt que de finir dans un fauteuil de ga-ga poussé par quelque Parlement). Onomatopée faite à l'imitation du bredouillement des personnes retombées en enfance; le rapprochement avec gâteux* n'est que secondaire. (Cf. FEW t. 4, p. 20b-21a; Bl.-W.5).

  25. flow said,

    August 8, 2017 @ 6:36 am

    @VHM—since you left us with nothing but these barren figures, we have to take it from there. 27lbs 11oz equals 12.559kg in the rest of the world (or one and a quarter garden buckets). The STEDT project page states it was started in 1987, 30 years or, roughly, 30 * 200 = 6000 working days ago, which gives us approximately 2 grams of output per day. If that was coffee, you could have a cup every 2.5 to 3.5 days, depending on how strong you like it. But this is paper, which should weigh around 80 grams per square meter (or rather less for book paper, I guess); hence, we arrive at an estimated 314 square meters (2 * 12559 / 80) of printed surface. That is a square of 17.72 meters by 17.72 meters. Surface area per day is roughly 523 square centimeters (area / ( 30 * 200 ) * 100 * 100), a square of 22.9 x 22.9 cm². Incidentally, this is almost exactly one page of A4 standard European paper (21cm x 29.7cm) per day.

  26. flow said,

    August 8, 2017 @ 6:38 am

    @Chris Button—thanks for the pointer, I found Matisoff's oeuvre at http://www.ucpress.edu/book.php?isbn=9780520098435. It is dense, to say the least; I'll try the section on Chinese to get started.

  27. Rodger C said,

    August 8, 2017 @ 6:52 am

    linguists are on the whole incredibly reluctant to accept the reality that vowels are nothing but a surface phenomenon

    Reminds me of the old definition of historical linguistics as "a science in which the vowels count for nothing and the consonants for very little."

  28. Victor Mair said,

    August 8, 2017 @ 7:21 am

    "Incidentally, this is almost exactly one page of A4 standard European paper (21cm x 29.7cm) per day."

    So? And? Do you have a conclusion to draw from all of your computations and that incidental remark?

  29. flow said,

    August 8, 2017 @ 7:42 am

    @VHM—yes, I think it is quite a bit, especially considering that it is only an average, it had to be sustained over decades, and I also imagine the fact density to be quite high.

  30. David Marjanović said,

    August 9, 2017 @ 3:38 pm

    Dear Prof. Mair,

    I am sorry that I let my tone distract you from the issues I tried to bring up. Please rest assured that I respect you deeply for a long list of reasons. But haven't we all fallen for unparsimonious hypotheses at the very edge of our knowledge? I certainly have, repeatedly.

    I can't access the article on dimidiation you recommended, not even through the usual dark channels. Searching for it on Google Books, however, led me to some of your own writing, where gāngà is mentioned and dimidiation seems to be explained as the insertion (epenthesis) of vowels into syllable-initial consonant clusters, a process that has indeed been suggested – as a regular sound law, AFAIK – for the origin of certain MSM disyllables from OS monosyllables. That, however, would not explain *kreːm kreːds from *kremds, where the *kr- would be retained unchanged.

    So, to the topic: Wu's (2016) paper.

    The title is already a red flag for any historical linguist: "Patterns of Sound Correspondence between Taiwanese and Germanic/Latin/Greek/Romance Lexicons". Why not Old Sinitic* and Proto-Indo-European? Wu explains (p. 5):

    The present study also attempts to present lexical correspondences between the Western and Eastern languages, and differs from all the above (except Haldeman’s and Østmoe’s works) in three major aspects. Firstly, it compares written lexicons of Germanic, Latin, Greek and Romance languages with a living language, modern Taiwanese, whereas the studies cited above are concerned with comparing reconstructed Old Chinese (that is, reconstructed sounds, because its script was not alphabetic or syllabic) — or even Proto-Chinese — with Proto-Indo-European. Thus, the present study is based on known sounds whereas the earlier studies suffer from the fact that, as Wei succinctly puts it, “the reconstruction is built on an edifice of inferences” (Wei 2005). Reconstruction of Middle Chinese has a solid linguistic basis because of the thorough investigation of the rime systems in the successive series of official rime books following the publication of Qièyùn 切韻 by Lù Fǎyán 陸法言 in 601 CE. However, because of the lack of rime books and the qièyùn system before 601, the sound system of Old Chinese has to be based on inferences. Therein lies an unknown degree of uncertainty.

    This is a really extreme case of letting the perfect be the enemy of the good. The reconstruction of PIE has been pretty stable in the last 30 to 40 years despite several attempts to overhaul it drastically; and the latest few reconstructions of OS – by Pulleyblank, Schuessler, Starostin, Zhèngzhāng, and Baxter & Sagart – are all so similar that it looks like they've largely converged on a solution. Sure, even the latest one (Baxter & Sagart 2014) isn't unassailable, but the basic features all seem solid. (For instance, see here, citing a Handel paper, on whether medial *-r- was really so common – followed by this discussion of the personal name 范師蔓 in a Buddhist text, which remains opaque in MSM (fànshīmàn) and in MS (bjomX ʂij manH), but whose reconstructed OS form with two medial */r/s (*bromʔ sri man-s) makes sense as brahma- śrīmān.

    The younger your sources, the greater becomes the dangers of mistaking accidental lookalikes for related words and of not recognizing related words because they don't look alike anymore. Let me emphasize that the first danger, too, is real and pervasive: consider this very sarcastic blog post by a working historical linguist or this article which is done less seriously/sarcastically but concludes:

    All this is intended to show how easy it is to find such spurious correspondences. But that's not the end of it; Ruhlen & Greenberg have the opposite problem as well: there's not only too much variation in their list, there's too little. Languages that really are related have diverged much more in 6000 years than some of R&G's words seem to have diverged in at least 10,000.

    Hock uses Hindi and English as an example. The following words, for instance, are real cognates:

    cakka: wheel
    pa:nch five
    si:~g horn
    chah six
    pissu: flea

    Surely R&G would be embarrassed to pick words that far apart as cognates for Proto-World; with that level of phonetic resemblance, everything is related to everything. On the other hand, they might seize upon such a pair as Hindi lu:t. 'rob' and English 'loot'… but these are not cognates; English borrowed the word from Hindi. The actual English cognate of Hindi lu:t. is 'leaf'… which illustrates as well some of the semantic divergence that can occur in 6000 years.

    Applied to the Indo-European family (which we know from careful comparative work to be related), R&G's mass comparison would yield large numbers of both false positives (lu:t. and loot, day and dies, have and habere) and large numbers of false negatives (cakka: and wheel, lu:t. and leaf, date and dacha, milk and lettuce). Applied to unrelated languages, the method will generate long lists of bogus resemblances due to chance (as in my Quechua/Chinese comparison above).

    * (Obviously, Proto-Sino-Tibetan would be even better, but it hasn't been reconstructed. Matisoff's valiant effort is just a first large step.)

    Page 6 of Wu's paper:

    Wang (1996) used the Neighbor Joining method to analyze the relationship among seven major topolects of Sinitic, including Xiamen (which is very close to Taiwanese). Using two different sets of data, the study found that Xiamen is consistently by far the most distant topolect among the seven. Thus, of all the topolects of Sinitic, Taiwanese should be considered closest to Old Chinese.

    This simply doesn't follow. Showing that the basic dichotomy of Sinitic is between Min and (most or all of) the rest doesn't say anything about which of these two branches has innovated less.

    Further, given the timespans involved, we shouldn't even expect that any branch has innovated less than any other. Each language and dialect has retentions and innovations of its own; some may be more conspicuous than others, but global statements like "Lithuanian is more conservative than English" always require plenty of modification.

    P. 8:

    Taiwanese is mainly a hybrid of the two largest dialects of Southern Min, namely, Choân-chiu (Quánzhōu 泉州) and Chiang-chiu (Zhāngzhōu 漳州)

    Oh, great – so we shouldn't even expect regular sound correspondences between Taiwanese and Proto-Southern Min in those features that the parent dialects don't have in common. This sounds to me like a potentially major handicap.

    P. 15:

    For example, the accusative of L fungus ‘mushroom’ is fungum, from which is derived It fungo: L acc. fungum > fungu > It fungo. Tw hiuⁿ-ko· 香菇 ‘mushroom’ can be derived in the same way: L acc. fungum > fungu > *fungo > (f- > h-) > Tw hiuⁿ-ko· (with g- > k-) 香菇 ‘mushroom’.

    We aren't told if f- > h- (a common sound change worldwide) or g- > k- (a very, very odd sound change for a language that retains voiced plosives) are meant to be regular sound laws or untestable sporadic developments. But consider the scenario: as basic a word as "mushroom" is casually implied to be an early medieval loan from western Romance. This isn't going to convince any historical linguist.

    P. 16 tells us that Schuessler's OS reconstruction and Pokorny's (1959) PIE reconstruction are cited several times. Why do that, when they aren't actually used? (Pokorny's is drastically outdated, but would still be better than using only attested languages from a strange subset of IE.)

    P. 17:

    The Church Romanization script uses diacritic marks to
    indicate the tones. The diacritics are retained in this report for orthographic purposes, but they can be ignored without impeding understanding.

    Sinitic tones, even Min tones which aren't cognate to those of Late MS and its descendants, don't come out of nowhere, though. They come from lost syllable-final consonants and from lost features of syllable-initial consonants. That is very important information.

    P. 18:

    pa 爸 ‘father’ (Cf. L pater id.)

    Seriously? A mama-papa word as evidence in historical linguistics? Seriously?

    爺 ‘high official’ (Cf. ON jarl ‘earl’)

    Show that the ^ tone corresponds regularly to final -rl or -r or -l, and I'll be quite intrigued. This is not sarcasm.

    ú雨 ‘rain’ (Cf. ON úr ‘drizzling rain’)

    The shorter a word, the easier it is to find accidental lookalikes. Even so, however, consider the ´ tone of the Taiwanese word. The correspondence – MSM appears to suggest that ´ is one of the reflexes of an OS final glottal stop. Where is that on the Germanic side? It's not hidden in the vowel length of úr; that's not where Norse vowel length comes from.

    è 裔 ‘descendant’ (Cf. E heir)

    English < Norman French > Latin hērēs. Not only the r, but even the h is etymological.

    BTW, the German Erbe "heir (m.); inheritance (f.)" may seem similar, but it's instead a cognate of orphan, which is an ultimately Greek loanword in English.

    ò 澳 ‘river mouth’ (Cf. ON óss, L
    ōstium id.)

    Or indeed Latin os "mouth"…

    米 ‘rice’ (Cf. ON bygg ‘the cultivated crop’)

    We are looking here at the famous Min denasalization of initial nasal consonants, a completely regular process as far as I'm aware. The /m/ of Mandarin is older. On the Norse side, the y was /y/, not /i/, and comes from an original /u/ flipped by the /j/ in the next syllable of the underlying verb ON byggja.

    pín (G1W) ‘pin; to pin up’ (Cf. E pin)

    Beginning as it does with p-, the English word can't be inherited from PIE. But explaining it as a loan from Taiwanese doesn't work either, because that couldn't explain the English aspiration.

    phe (G1W) ‘letter’ (Cf. ON bréf ‘letter’)

    That's a late Latin loanword into Proto-West-Germanic (retained in modern German: Brief) that was then passed on to Old Norse. The Latin original is brēvis, which means "short" ( > "short note" > "letter"). Note that Latin ē and ON é are the same thing, and High German ie is its regular reflex.

    thoa 拖 ‘tow, drag’ (Cf. E tow)

    The -w is etymological; it comes from -[ɣ]. There's a nice word family involving tow, tug and tuck by means of Kluge's law…

    鵝 ‘goose’ (Cf. OE gōs id.)

    The change of -/an/- into -/oː/- before fricatives is a North Sea Germanic innovation; German retains Gans to this day.

    ka 家 ‘home, house’ (Cf. L casa ‘home, cottage’)

    And the second syllable, which is not a morpheme in Latin (or Romance), is left unexplained. Also, isn't the loss of -/j/- behind velars a Min innovation?

    khu 曲 ‘curved, bent’ (Cf. L curvus ‘curved’)

    Up to here, except in phe, Wu has been pretty consistent in lining up Taiwanese aspirates with English aspirates and Taiwanese voiceless non-aspirates with Latin voiceless non-aspirates. Note also how the tone differs from .

    cheng 精 ‘smart’ (Cf. L genius ‘genius’)

    Latin genius did not mean "genius", it meant "spirit" (e.g. genius loci "spirit of the place, locally bound minor deity"); and up until around the end of the Western Roman Empire, it began with [g], not with an affricate!

    chhī 飼 ‘to feed’ (Cf. L cibō id.)

    Unexplained affrication; unexplained aspiration; unexplained loss of the second syllable (the ˉ tone should be a reflex of open syllables, so the second syllable seems to be truly lost).

    是 ‘yes’ (Cf. It , Sp id.)

    Here we're looking at the general southern Sinitic loss of retroflexion; and in Latin, si neither meant "yes" nor was it a demonstrative pronoun (as 是 once was), but it meant "if". ("If" > "if so, then yes" > "yes" is no stranger than colloquial Polish no "well" > "well, yeah" > "yes" or indeed Russian no *"well" > *"well, but" > "but".)

    jiû-tō 柔道 ‘judo’ (Cf. L luctō ‘to wrestle’)

    No, not "to wrestle", but "I wrestle"; is the 1sg present ending. (It's a tradition of Latin lexicography to provide the 1sg present form first.) That takes away half of the similarity to 道… the other half may be taken away by the fact that the Latin -t- is the frequentative suffix.

    P. 25:

    It should be noted that this rule [loss of all but the first syllable in Taiwanese] is not hard and fast as there are many Taiwanese words that are disyllabic which can be traced to disyllabic European words. Examples are:

    This sounds a lot like "if I can line up the later syllables, I declare them cognate; if I can't, I declare them lost in Taiwanese". No explanation of what happens when is given. I'm afraid that's circular logic.

    I'll stop here because it's late at night; I can continue if you like.

    ================

    Did someone in this post ("GA") say that "Sino-Tibetan family… really has a 'Tibeto-Burman' branch"? Not I.

    I was referring to the fact that you listed "Tibeto-Burman" among "language families" in the OP:

    "Perhaps this might suggest cognates in Sino-Tibetan, Tibeto-Burmese, IE, or other language families."

    Other topic, re David Marjanović "another heap of indefensible pseudoscience"—Is there any accessible source that could convince a hobby linguist with some background in Mandarin and Japanese of the reality of Tibeto-Burman and/or Sino-Tibetan? I mean the people living in present-day East Asia sure have a history of intercultural exchange going back thousands of years, so naturally there are similarities to be found in their languages; does anything along those lines prove Tibetan and Mandarin have the same roots, their speakers common ancestors?

    It's not easy, because a reconstruction as for PIE doesn't exist yet; the family is so large that many languages, indeed entire branches, are only being described right now. One line of current work looks into teasing apart the prefixes and suffixes that have generated Matisoff's "allofams"; this paper is an example.

    However, there is hope for a surprisingly IE-like demonstration from shared morphology. The hundreds of ST languages span the full range from polysynthetic to isolating; traditionally, it used to be thought that the isolating groundplan is original, but there's evidence that some (not all!) of the polysynthetic morphology of the Kiranti languages in Nepal and the Rgyalrongic languages in Sìchuān is actually cognate, and that the few affixes of Old/Written Tibetan and the reconstructed affixes of Old Chinese are highly reduced remains of it. Of course, nobody knows if the last common ancestor of all these languages was in fact Proto-Sino-Tibetan… I recommend this paper as an introduction to the state of this research and to its difficulty.

    linguists are on the whole incredibly reluctant to accept the reality that vowels are nothing but a surface phenomenon.

    That's because no known or reconstructed language distinguishes fewer than two phonemic vowels. MSM has just two; MS had a lot more, and OS quite evidently had six. PIE had two if you manage to interpret *a away (which requires a few contortions, but not downright impossible ones) and if you interpret every single *i and *u, even in reduplicants, pronouns, noun roots and verb roots that don't participate in ablaut, as */j/ and */w/; there are minimal pairs like *nokʷts "evening" (nominative) and *nekʷts "evening's" (genitive).

    Reminds me of the old definition of historical linguistics as "a science in which the vowels count for nothing and the consonants for very little."

    That's Voltaire's quip about etymology as it was done back then – before historical linguistics existed.

  31. David Marjanović said,

    August 9, 2017 @ 3:40 pm

    Blockuqote!!! The quote from p. 15 ends before "We aren't told".

  32. Chris Button said,

    August 9, 2017 @ 7:32 pm

    @ David Marjanović

    I would recommend keeping Pulleyblank's Old Chinese reconstructions at a safe distance from those of Starostin/Zhengzhang/Schuessler/Baxter&Sagart. Pulleyblank's work certainly has a lot of contentious areas, but is nevertheless vastly superior in adequately accounting for the rimes and their evolution into Middle Chinese.

    The Indo-European "vowel" /e/ (with its ablaut variant /o/) has actually been interpreted on several occasions as schwa (with an ablaut variant /a/). That this happens to coincide with the alternation of schwa with /a/ in Old Chinese most likely has nothing to do with a common origin for PIE and OC (as Pulleyblank seems to have thought), but rather with the syllable (i.e. schwa) being the basic building block of language.

  33. Zoner said,

    August 9, 2017 @ 7:41 pm

    The YouTube version of the video works better: https://www.youtube.com/watch?v=Pd6AOGCMYF8

    I take it from the music (Snoop Dogg's Gin and Juice), the GTA5 in the title, and the minimap in the lower left that the uploader is saying it's like a game of Grand Theft Auto? That series does allow for weird police interactions.

  34. zachary said,

    August 10, 2017 @ 1:35 am

    Hi Professor Mair, I just wanted to express my appreciation for your blog and posts like these! I actually stumbled across two posts from here quite unintentionally. A couple of weeks ago, I had been wondering if there was any reason behind why 蛮 was used for “very; quite” in Shanghainese, and came across your post from May 27, which I found very enlightening. Just today I was thinking about 尴尬 (a favorite word of mine as well) and had been wondering what exactly the etymology was when I found your blog post about it! (very fascinating info about its origin, by the way!) What’s more is that when I looked at your name, I realized I had read a paper by you a couple of years back about wontons and hundun! I’m glad that you are answering questions I never knew I had!

    On a somewhat relevant note, I recall reading from “Modern Chinese: History and Sociolinguistics” by Ping Chen, that 尴尬 was but one of several Wu Chinese borrowings into modern written Chinese, along with: 垃圾 (supplanting 脏土 in Northern Mandarin), 陌生, 蹩脚 and 名堂. However, the author didn’t really go into much detail about these borrowings—I was wondering perhaps if you knew more about these ‘loanwords’, or of other borrowings that have subtly creeped into modern Mandarin.

    Anyway, I really enjoy your blog in general and especially these etymology/philology posts, and I look forward to reading the rest of your posts! Thank you!

  35. Rodger C said,

    August 10, 2017 @ 6:51 am

    Something I've been wondering: Is it possible that a Taiwanese would want to compare Taiwanese with Germanic because of some school- or family-transmitted echo of the German-Japanese WWII alliance, i.e. "Germanic languages are the purest IE languages"?

  36. Jichang Lulu said,

    August 10, 2017 @ 1:03 pm

    Parler gaga also means French as spoken in and around Saint-Étienne. Apparently, it used to refer to (the local form of) franco-provençal, from which modern gaga has inherited some vocabulary. Like the more common meaning of gaga, it could be imitative (a bit like 'babble'). I don't think the parler gaga is particularly [g]-rich. One thing is does have a lot of is [ɑ] (essentially any /a/ unless before /r/).

    French ganga is also a bird (the sandgrouse?).

    The Trésor's etymology comes from the Französisches Etymologisches Wörterbuch.

  37. Jichang Lulu said,

    August 10, 2017 @ 1:16 pm

    @David Marjanović

    On 连绵词 (reduplicating monomorphemic disyllables/binomes), there's e.g. Jian Li's thesis. Also this.

    尴尬 ganga certainly belongs to the lianmanci class, although not to the 'inserting' type, but rather the 'alliterating' (双声) one, in which both syllables have the same initial.

  38. Jichang Lulu said,

    August 10, 2017 @ 1:38 pm

    @Chris Button

    I remember reading some of your anti-vowel statements elsewhere, perhaps in some of your work on Northern Chin. I found them intriguing, and kept meaning to ask you about your views on the 'primacy of the syllable'. Do you mean you expect ancient languages to be more likely to be analysable as having no phonemic vowels? Or that analyses with fewer or no vowel phonemes are generally desirable, even for languages with lots of surface vowels?

  39. Laurent Sagart said,

    August 11, 2017 @ 11:49 am

    @Chris Button,

    There exists a methodology for quantitatively evaluating claims such as "X's [Old Chinese reconstruction] system is superior to Z's in adequately accounting for the rhymes [of Old Chinese]': see the recent squib by Mattis List in Lingua Sinica (https://www.academia.edu/33593694/Vowel_purity_and_rhyme_evidence_in_Old_Chinese_reconstruction). There List compared the systems of Karlgren, Wang Li, Li Fang-kuei, Zheng-Zang, Starostin, Pan, Schuessler and Baxter-Sagart. He did not include Pulleyblank's system because he does not have a list of Pulleyblank's reconstructions for Shi JIng rhyme words. If you had such a list, it would be useful to compare. Perhaps you could make it available here, or send it to me privately, and I would forward it to List.

  40. Victor Mair said,

    August 11, 2017 @ 4:31 pm

    From William Baxter:

    In my view, the main reason for preferring a six-vowel reconstruction of Old Chinese to Pulleyblank's two-vowel reconstruction with *a and *ə is that the six-vowel reconstruction makes correct predictions about OC rhyming, while Pulleyblank's system fails to do so. Scholars of the Qīng dynasty came up with an analysis of Shījīng rhyming that is still widely used, with only minor modifications, in studies of early texts. Pulleyblank's two-vowel hypothesis was based on this traditional analysis.

    The hypothesis that OC had a six-vowel system arose largely from an analysis of the distribution of initials and finals in Middle Chinese; for details of the argument, see Baxter (1992: 236-258) and Baxter & Sagart (2014: 198-211). The six-vowel reconstruction predicts that OC had rhyme distinctions that were overlooked in the traditional analysis. For example, in the traditional analysis, the following words are all assigned to the 月 Yuè rhyme group:

    渴 kě, MC khat ‘thirsty’
    滅 miè, MC mjiet ‘destroy’
    掇 duó, MC twat < ‘pick, gather’

    Most traditional reconstructions follow the traditional analysis and reconstruct the same or at least similar main vowels in all three words: Karlgren's reconstructions are *k’ât, *mi̯at, and *twât; Li Fang-kuei's are *khat, *mjiat, and *tuat. But the six-vowel reconstruction requires that we reconstruct them with three different main vowels: *-at, *-et, and *-ot respectively. Rhyming practice is not always strictly based on phonology, but as a default, we would expect that words that rhyme should have the same main vowel, and that *-at, *-et, and *-ot should not rhyme with each other. This implies that what the 月 Yuè group of the traditional analysis was actually three different rhyme groups, *-at, *-et, and *-ot, and that these rhyme distinctions were overlooked in the Qīng scholars' analysis.

  41. Victor Mair said,

    August 11, 2017 @ 5:02 pm

    From William Baxter, cont.:

    The core of Baxter (1992) is an argument that the predictions of the six-vowel system about rhyming are correct: for example, that *-at, *-et, and *-ot actually do not rhyme with each other (see Baxter 1992: 389-413). There are a few irregular rhymes that do appear to mix them, but far fewer than we would expect if they really did rhyme normally, and no more than the mixed rhymes we find among well-recognized rhyme groups.

    And in some cases it can be shown that the rhyming irregularities are due to late changes in the text. For example, in the received versions of Lǎozǐ 39, there is a six-line rhymed passage where all the words are to be reconstructed with *-at except for the fifth line, where the rhyme word is 滅 miè, which in the six-vowel system has to be reconstructed with *-et. But abundant independent evidence (including excavated texts) shows that the fifth line is a later addition, from a time when the phonological system, and probably the rhyming conventions, were different. Even without this evidence, the six-vowel system tells us that the fifth line is suspect; but traditional reconstructions like those of Karlgren and Li Fang-kuei do not indicate anything suspicious about this rhyme of this line.

    Pulleyblank never published a comprehensive reconstruction of Old Chinese involving his two-vowel system, including both onsets and rhymes and their possible combinations. His hypothesis that OC had a two-vowel system, with *a and *ə, is set out in

    Pulleyblank, Edwin G. 1977. The final consonants of Old Chinese. Monumenta Serica 33. 180–206.

    But this paper only gives reconstructions for OC finals; in some cases the development of the final is said to depend on features of the initial consonant, but there is no explicit reconstruction of them.

    Based on this paper, he would probably have reconstructed 渴 MC khat as "*khát", and 滅 MC mjiet perhaps as "*mjàt", but there is no explanation of how to reconstruct syllables like 掇 MC twat. Based on conversations with Pulleyblank several decades ago, I think he would argue that any rhyming distinction between our *-at and our *-et is the result of an allophonic fronting of *-at under the influence of some palatal element in the onset, and the separate rhyming of our *-ot to allophonic rounding due to the influence of some labial element in the onset (in the paper, he writes a superscript "j" before the vowel to indicate the palatal element, and superscript "w" to indicate the labial element. But this is an additional stipulation, and without an explicit reconstruction of what these palatal or labial elements were, the hypothesis can't be evaluated. It also complicates the relationship between phonology and rhyming, since it assumes that rhyming was based on phonetic rather than phonological distinctions.

    Another detail of OC rhyming is that there are rather frequent irregular rhymes between our *-et and our *-it. These are irregular in anyone's reconstruction (they belong to the traditional 月 Yuè and 質 Zhì groups respectively), but in the six-vowel reconstruction there is a natural explanation: *-et and *-it are similar in that they both have front vowels, and because there were not many available rhyme words with *-et, approximate rhymes between *-et and *-it were evidently tolerated. In Pulleyblank's reconstruction these are "*-(j)at" and "*-əc" respectively, and any explanation would have to be more complex.

    Pulleyblank also gives little in the way of positive reasons to reconstruct only two main vowels. He says "there is good reason to think that it [the two-vowel system] has been the underlying pattern in Chinese at all periods", but I'm suspicious of any assumption that particular languages are likely to maintain such persistent characteristics over long periods of time: do the Italic or Germanic branches of Indo-European show any such long-term underlying patterns? His examples of supposed ablaut are rather few and scattered, and if they are morphological in origin they probably reflect traces of the morphology of an earlier period, no longer productive at the Old Chinese stage. In any case ablaut does not require us to reconstruct a two-vowel system.

  42. Chris Button said,

    August 11, 2017 @ 9:58 pm

    @ Jichang Lulu

    It is certainly no coincidence that schwa (i.e. the syllable) is all that is left when Proto-Indo-European and Old Chinese are reconstructed.

    While some individual languages may be more amenable to a vowelless/vertical-vowel analysis than others (albeit often involving internal reconstruction), this is generally only on an underlying phonological level and certainly not on a surface phonetic level when the vowels can be clearly heard. In the many languages that show no trace of such an analysis, it only becomes apparent from an external comparative perspective when, as the PIE and OC reconstructions attest, eventually it becomes inescapable.

    @ Laurent Sagart & Bill Baxter

    Firstly, I don’t think my use of the term “vastly superior” showed the due respect that you both know I have for your work. I should have said “better in my opinion”.

    As for OC rimes, I wasn’t simply referring to the rhyming patterns in the Shijing, but more broadly rimes versus syllable onsets.

    Pulleyblank’s alternation of schwa and “a” provides an all-pervasive regular pattern of ablaut across the OC lexicon. On the other hand, six-vowel systems essentially leave us with extensive “allofamic” variation (to borrow Matisoff’s term). While Pulleyblank did not compile an extensive list, I have been engaged in the process of doing so while working on my dictionary and am happy to share it via e-mail. The evidence for this often relies on inscriptional evidence well before the time of the Shijing.

    Specifically as regards Shijing rhyming, I believe the statistical evidence to be misleading. In the simplest possible terms, even if something like /wa/ and /ja/, for example, are treated as /e/ and /o/ because in the majority of cases they do not rhyme with /a/, we still need to account for the cases when they do. By not forcing phonemic /o/ and /e/ onto the system, we can see that for the Shijing compilers, something like /wa/ and /ja/ tended to be phonetically close to something like /o/ and /e/ due to assimilation of the glide and the vowel (hence the tendency to keep them separate), but that the phonological rhyme was crucially still the “a” component of all three (hence the ability to rhyme with each other as well). As a side note, it is far easier to account for straight up surface phonetic assimilation of a glide with a vowel in a /wa/~/o/ variation than entertaining notions of later phonological vowel breaking from /o/>/wa/ in the evolution of Middle Chinese without any of the conditioning environments we have in say the evolution of Spanish (as opposed to Portuguese) from Latin.

    The glides /j/ and /w/ palatalized and labialized back-codas which accounts for their clear separation in the rhyme categories in all reconstructions. An answer as to where the glides /j/ and /w/ originally came from may be provided by a comparison with Proto-Indo-European. Again in the simplest possible terms, Proto-Indo-European /j/ and /w/ pattern as the other sonorants (e.g. /l/, /n/, /m/ etc) in being able to occur as the syllabic nucleus (although only /j/ and /w/ afford us with the transcriptional flexibility of /i/ and /u/ and hence the perception of high vowels in PIE). In Old Chinese, only the glide category of sonorants could occur as the nucleus such that a syllable of the type CjC was possible (in which C = consonant, and schwa is inherent in the glide that alternates with an ablaut form CjaC) but something like CnC was not.

    Finally, I don’t quite follow the notion that phonological /-it/ and /-et/ are more likely to rhyme than phonological /-əc/ (which I derive for earlier /-jət/ and /-əkj/) and /-jat/ (which did not merge with /-akj/) when the surface phonetic realizations based on feature assimilation are properly taken into account. The evolution of similar rhymes in Burmese provides a good comparison here.

  43. Victor Mair said,

    August 12, 2017 @ 8:14 pm

    From Tsu-Lin Mei:

    Thanks for the e-mail, especially the information on Matisoff, Sino-Tibetan Etymological Dictionary and Thesaurus. (STEDT).

    · Chris Button … should read Gong Hwang-cherng “A comparative study of the Chinese, Tibetan and Burmese Vowel Systems”, BIHP 51.3:455-490, in which Gong Hwang-cherng showed that the 4-vowel system of Old Chinese (According to F.K. Li’s (1971) reconstruction of Old Chinese) correspond to the 3-vowel system of Old Burmese and the 5 vowel system of Written Tibetan. He gave 172 cognate sets, most of which I am sure will turn up in Matisoff’s STEDT.

    · The issue is not between the 6-vowel system of Baxter and Sagart, ZhengZhang Shang-fang etc. and the 2-vowel system of Pulleybalnk (which nobody defends). The issue is between F.K. Li’s 4-vowel system and Baxter’s 6-vowel system. In the 6-vowel system, it is inevitable that a single rhyme group 韵部 will contain finals with different vowels. This is counter-intuitive to most of us.. Gong Hwang-cherng 1995 “The System of Finals in Proto-Sino-Tibetan” produced 278 cognate sets. I have yet to see a comparable cognate set for B and S or ZhengZhang Shang-fang. If they are confident about their reconstruction, they should not hide their light under the bushel.

    · On the basis of Schuessler, 2007 ABC Etymological Dictionary of Old Chinese, 30-31, with added material from Chepang, e.g. ‘shit, excrement” kli? : OC 屎*hljidx,WB khliy, WT lci <*hlji, I was able to show that Old Chinese rising tone derives from a final glottal stop -?. This is a theory announced by Pulleyblank 1962, and not proven until Schuessler(2007) and Mei(2017) come along. In a few days I may be able to send you a pdf version of my 8-page paper.

  44. Chris Button said,

    August 12, 2017 @ 9:54 pm

    @ Tsu-Lin Mei

    I actually wrote about Gong's analysis of the Burmese rhymes on the pages numbered 38-42 (p.42-46 of the pdf itself) here:

    http://stedt.berkeley.edu/pubs_and_prods/STEDT_Monograph10_Proto-Northern-Chin.pdf

    The analysis might not be entirely to your liking, but I hope it provides food for thought at least.

  45. Laurent Sagart said,

    August 13, 2017 @ 2:06 am

    @ Tsu-Lin Mei

    A list of about 300 Written/Old Tibetan, Burmese and Old Chinese cognates, with cross-references to Gong's comparisons, and with sound correspondences based on the Baxter-Sagart 6-vowel reconstruction, was published by Nathan Hill a few years ago. It can be downloaded from
    https://www.academia.edu/3208877/The_six_vowel_hypothesis_of_Old_Chinese_in_comparative_context

  46. Victor Mair said,

    August 13, 2017 @ 7:16 am

    From Tsu-Lin Mei:

    In Gong (2002:119) you will find OC *liap > *diap > diep “butterfly”, WT phye-ma-leb “butterfly’. 胡蝶may be di-syllabic, but from the perspective of ST comparative linguistics, the etymon is monosyllabic, i.e. **liap. All of Gong’s 278 cognate sets consist of monosyllables. You can draw your own conclusion.

    In F.K. Li’s 4-vowel system, there are loose ends, namely ia, ua and i+ shwa. Perhaps when these compound vowels are worked out, Li’s system may turn out to be 6-vowel after all.

    In American linguistics, there are two traps. One is generative vs. structural. A young scholar can spend a decade in this controversy without getting any solid result. The other is the 4-vowel and 6-vowel controversy in Old Chinese reconstruction. I am just an ignorant Sinologist trying get some positive results—which may contribute to the solution of 4-vowel vs. 6-vowel. I am quite sure there are better equipped scholars dealing with the fundamental questions in the PST vowel system. I just hope they will turn up some positive results instead of learned, empty talk.

  47. Chris Button said,

    August 13, 2017 @ 2:39 pm

    Further to Prof. Mei's and Prof. Mair's comments, I would just like to make it clear that I have absolutely nothing against the 6 vowel hypothesis per se. On the contrary, I actually think the statistical analysis it entails brilliantly elucidates the rhyming practices of the time and is an incredibly valuable contribution to scholarship. However, what I question is the rigid phonemicization of it as something representative of the underlying structure of Old Chinese phonology rather than a surface phenomenon which should instead be associated with the inherent flexibility of rhyming.

    In extremely broad terms (i.e. missing huge amounts of very important detail – so please nobody criticize me for this very general comment), a Pulleyblank inspired vowel system can be sort of seen as underlying something like a Li Fang-Kuei vowel system which in turn can be sort of seen as underlying something like a Baxter-Sagart vowel system which overtly represents (albeit rather rigidly) more surface phenomena. The question as to which reconstruction to go with then really depends on the intended use (a deep understanding of the phonological structure or a better representation of what things might have actually sounded like in reality) and, in this regard, they all have their own merits.

  48. David Marjanović said,

    August 13, 2017 @ 5:11 pm

    Two days ago I wrote a long comment, which never got through – no error message, no message that it went to moderation (which it evidently didn't), nothing. Perhaps my connection is to blame. Before I try again, however, I'll try to comment on even more recent issues.

    Pulleyblank also gives little in the way of positive reasons to reconstruct only two main vowels. He says "there is good reason to think that it [the two-vowel system] has been the underlying pattern in Chinese at all periods", but I'm suspicious of any assumption that particular languages are likely to maintain such persistent characteristics over long periods of time: do the Italic or Germanic branches of Indo-European show any such long-term underlying patterns?

    In short, no, not remotely.

    Classical Latin had 5 long and 5 short vowels plus a few diphthongs; all Romance languages have eliminated the length distinction in one of several possible ways, almost all of which involve mergers between vowels of different qualities, not to mention monophthongizations and sometimes diphthongizations. French also developed a new length distinction and then lost it again (stepwise). Romance languages today have between 5 and more than 15 vowels.

    Proto-Germanic has been reconstructed, for the last several decades, with 4 short and 4 long vowels, to which several nasal vowels and maybe an overlong one can be added as well as a few diphthongs. The modifications from that to the Gothic system were rather minor, but Northwest Germanic and its descendants went through large upheavals including several kinds of umlaut phenomena (partial assimilations of stressed vowels to following vowels, generating new phonemes when the following vowel was lost or contracted with the stressed one), loss of nasality (occasionally regained later), changes to the length system, mono- and diphthongizations; it's now hard to find two German dialects, let alone two North Frisian dialects, whose vowel systems are the same, even though all extant Germanic vowel systems this side of Tok Pisin are, to the best of my knowledge, larger than that of Proto-Germanic.

    It is certainly no coincidence that schwa (i.e. the syllable) is all that is left when Proto-Indo-European and Old Chinese are reconstructed.

    Lehman tried in the 1970s to perform an internal reconstruction on PIE to find out what pre-PIE could have been like before ablaut developed. The result was a one-vowel system. It has pretty much been ignored because internal reconstruction beyond the comparative method has too many degrees of freedom; it's simply unconstrained on one end. (Unless of course if you allow as a constraint the fact that all attested languages, inside and outside of IE, seem to distinguish at least two vowels.)

    In the simplest possible terms, even if something like /wa/ and /ja/, for example, are treated as /e/ and /o/ because in the majority of cases they do not rhyme with /a/, we still need to account for the cases when they do. By not forcing phonemic /o/ and /e/ onto the system, we can see that for the Shijing compilers, something like /wa/ and /ja/ tended to be phonetically close to something like /o/ and /e/ due to assimilation of the glide and the vowel (hence the tendency to keep them separate), but that the phonological rhyme was crucially still the “a” component of all three (hence the ability to rhyme with each other as well).

    This is an interesting idea.

  49. David Marjanović said,

    August 13, 2017 @ 5:12 pm

    Pulleyblank also gives little in the way of positive reasons to reconstruct only two main vowels. He says "there is good reason to think that it [the two-vowel system] has been the underlying pattern in Chinese at all periods", but I'm suspicious of any assumption that particular languages are likely to maintain such persistent characteristics over long periods of time: do the Italic or Germanic branches of Indo-European show any such long-term underlying patterns?

    In short, no, not remotely.

    Classical Latin had 5 long and 5 short vowels plus a few diphthongs; all Romance languages have eliminated the length distinction in one of several possible ways, almost all of which involve mergers between vowels of different qualities, not to mention monophthongizations and sometimes diphthongizations. French also developed a new length distinction and then lost it again (stepwise). Romance languages today have between 5 and more than 15 vowels.

    Proto-Germanic has been reconstructed, for the last several decades, with 4 short and 4 long vowels, to which several nasal vowels and maybe an overlong one can be added as well as a few diphthongs. The modifications from that to the Gothic system were rather minor, but Northwest Germanic and its descendants went through large upheavals including several kinds of umlaut phenomena (partial assimilations of stressed vowels to following vowels, generating new phonemes when the following vowel was lost or contracted with the stressed one), loss of nasality (occasionally regained later), changes to the length system, mono- and diphthongizations; it's now hard to find two German dialects, let alone two North Frisian dialects, whose vowel systems are the same, even though all extant Germanic vowel systems this side of Tok Pisin are, to the best of my knowledge, larger than that of Proto-Germanic.

    It is certainly no coincidence that schwa (i.e. the syllable) is all that is left when Proto-Indo-European and Old Chinese are reconstructed.

    Lehman tried in the 1970s to perform an internal reconstruction on PIE to find out what pre-PIE could have been like before ablaut developed. The result was a one-vowel system. It has pretty much been ignored because internal reconstruction beyond the comparative method has too many degrees of freedom; it's simply unconstrained on one end. (Unless of course if you allow as a constraint the fact that all attested languages, inside and outside of IE, seem to distinguish at least two vowels.)

    In the simplest possible terms, even if something like /wa/ and /ja/, for example, are treated as /e/ and /o/ because in the majority of cases they do not rhyme with /a/, we still need to account for the cases when they do. By not forcing phonemic /o/ and /e/ onto the system, we can see that for the Shijing compilers, something like /wa/ and /ja/ tended to be phonetically close to something like /o/ and /e/ due to assimilation of the glide and the vowel (hence the tendency to keep them separate), but that the phonological rhyme was crucially still the “a” component of all three (hence the ability to rhyme with each other as well).

    This is an interesting idea.

  50. Laurent Sagart said,

    August 14, 2017 @ 4:39 am

    @ Tsu-lin Mei

    One should not too readily assume that Written/Old Tibetan forms with complex onsets (like gsum 'three', dmag 'war, army, soldier') were monosyllables. They may well have had a vowel, ignored by the script because it was not noncontrastive, between the first two consonants: [gəsum], [dəmag] etc. Other TB languages (e.g. rGyalrong, Dulong have vowels in the same position, often in the same words as WT. Bill Baxter and I suppose that this type of structure also existed in OC, alongside vowelless clusters and true monosyllables.

  51. Chris Button said,

    August 14, 2017 @ 9:58 am

    @ David Marjanović

    Regarding your earlier comment regarding suffixes versus allofams, the PST suffix /s/ has a variety of reflexes in Northern Chin which (simplifying some of the associated sound changes which get quite complex) range from a tone category corresponding to the OC qu-sheng through to /t/, /ʔ/ and /k/. All of the different reflexes are phonologically conditioned so clearly come from one original source.

RSS feed for comments on this post

Leave a Comment