Sino-Semitica: of gourds, cassia, and hemp and Old Sinitic reconstructions

« previous post | next post »

In a personal communication, Chris Button recently reminded me that I had once (more than two decades ago) written about the possible relationship between Semitic and Sinitic words for "gourd":

You might remember a while back I was asking you about your Southern Bottle Gourd Myths paper.

Recently, I've been working a little more on the 瓜 series in my dictionary and have ended up with it as an etymological isolate (bar the obvious relationship with 壺). So, I started looking for an external origin. Your note on the Arabic form qarʿa jumped out at me as being strikingly similar to my reconstruction of 瓜 as qráɣ and very supportive of the areal associations you outline in the paper.

That would add to the other two Semitic loanwords 麻* and 桂** here.

The merger of *-r with *-l in Old Chinese means 麻 *mrál could have gone back to an earlier 麻 *mrár which then aligns very nicely with the Semitic source to support Prof. Mair's suggestion.

We already have a precedent for a borrowing of this nature in 桂 *qájs "cinnamon, cassia" which could regularly go back to *qjáts and is likely associated with Hebrew qetsia "cassia

source of last two ¶s

[VHM:  *má ("hemp")]

[VHM:  **guì ("cinnamon, cassia")]

I had an old, learned German friend named Elfriede Regina (Kezia) Knauer (1926-2010) who was very much aware of the Semitic origins of her nickname and often asked me about its Sinitic parallels (see here, here, here, here, and here).  Hebrew קְצִיעָה‎ (“cassia tree”). Compare cassia. From Latin cassia (“cinnamon”), from Ancient Greek κασσία, κασία, κάσια (kassía, kasía, kásia), from Hebrew קְצִיעָה‎ (qəṣīʿā), from Aramaic קְצִיעֲתָא‎ (qəṣīʿătā), from קְצַע‎ (qṣaʿ, “to cut off”) (source).

Returning to the matter of the relationship between Semitic and Sinitic words for "gourd", let's start with the Semitic side:

Arabic qar'a yabisa ("dry gourd") > "calabash", cf. Persian kharabuz (source); so far as I can determine, the second half of these words means "dry" (< Arabic yābisah).

There are many different ideas about the etymology of Persian خربز‎ (xarboz ["melon]), etc., for which see here, but I believe the one I have given just above is correct for its origin.

قَرْع (qarʿ) m (collective, singulative قَرْعَة‎ (qarʿa)) ("pumpkin; gourd") < Classical Syriac ܩܪܐܐ‎ (qarrˀā) (source).

From Joe Lowry:

The word is qar‘ (root:  q-r-‘:  qaf-ra’-‘ayn) ( قرع ).  The ‘ayn usually makes one think that it is Semitic.  Modern Hebrew for pumpkin is qaraa ( קרא ) without the ‘ayn in the final position–but I don't know why its lacking the ‘ayn.  There is some interchangeability between ‘ayn and aleph in Hebrew and Aramaic, but whether that holds here between Arabic and Hebrew I don't know.  That is to say, I don't know whether these are cognates in Arabic and Hebrew in the sense of having a common origin in Semitic.  This is a decidedly non-scientific assessment.  A quick consultation of Brown-Driver-Briggs (Biblical Hebrew) suggests it's not in the Bible, but it is in Jastrow's dictionary of Rabbinic Hebr. and Aramaic.

One more thing:  It's in the Syriac translation of Jonah 4:6 (Peshitta), but the Hebr. in that passage (qiqiyon) is not cognate with that word.

Our English word "carboy", in my estimation, probably comes from the same root, though that's not exactly how most dictionaries derive it.  See, for example, The American Heritage Dictionary of English (5th ed.), where we find:

Persian qarābah, from Arabic qarrāba, big jug, from qarraba, to bring near, derived stem of qaruba, to be near

From there, we are directed to the entry for the triliteral root "qrb" in the appendix of Semitic roots at the back of the dictionary, p. 2076a, where we find:

qrb To be(come) near, draw near.

    1. carboy, from Arabic qarrāba, big jug, from qarraba, to bring near, derived stem of qaruba, to be(come) near.

I was hoping that there would be a more suitable Semitic root for the "qr" part.  Perhaps, though, if the Semitic word for "gourd" itself comes from an earlier, non-Semitic source, which I think is quite possible given the deep antiquity of human use of the plant as a container (see below), then we needn't expect that there would be a Semitic etymon for it

Now the Sinitic side:

guā 瓜 ("gourd; melon")

Old Sinitic

(Schuessler 2007:  264) /*kwrâ/

(BaxterSagart): /*kʷˤra/

(Zhengzhang): /*kʷraː/

This old word already appears in the Western Zhou bronze inscriptions c. 1st half of 1st millennium BC, but apparently not in the oracle bone inscriptions (c. 1200 BC).

To complicate, but also to enrich, the matter, it is my opinion that the common vernacular term for "gourd; calabash; cucurbit", húlu 葫蘆, is essentially the same morpheme as guā 瓜 ("gourd; melon"), though written disyllabically.  Indeed, in my "Southern Bottle-Gourd (hu-lu) Myths" paper (p. 188 and passim), I list a dozen or more different ways for writing this morpheme, most of them disyllabic.

I should mention a rule of thumb to which I adhere in the philological study of old Chinese texts, viz., if a Sinitic morpheme has multiple orthographic variants, especially if many (or all) of them are disyllabic, then chances are fairly good that the word may have been borrowed from a foreign language or that it entered the Sinitic mainstream from a topolectal source.

For copious examples of Classical Chinese (or Literary Sinitic) polysyllabic words, see the massive two volume Cí tōng 辭通 (Comprehensive Phrases; 1934), one of my favorite old dictionaries.

One of the alternative variant forms is húlú 壺蘆, although, because this is a plant, that itself has a further variant, where the first character is written with a grass radical on the top.

Note, however, that already in the Shījīng 詩經 (Poetry Classic; ostensibly 6th c. BC, but extant editions date from at least four centuries later), no. 154, has just hú 壺 with the meaning "bottle gourd".

A few Old Sinitic reconstructions:

húlu 葫蘆 ("bottle gourd")

(BaxterSagart): /*[ɡ]ˤa  C.rˤa/

(Zhengzhang): /*qʰaː|ɡaː  raː/

hú 壺 ("bottle gourd; flask")

(Schuessler 2007:  281) /*gâ/

(BaxterSagart): /*[ɡ](ʷ)ˤa/(Zhengzhang): /*ɡʷlaː/

The origins of the English word "gourd" are far from clear, but I have always suspected that it derives from the same source as the Semitic words discussed above.

Cf. "gourd" (n.) c. 1300, from Anglo-French gourde, Old French coorde, ultimately from Latin cucurbita "gourd," which is of uncertain origin, perhaps from a non-IE language and related to cucumis "cucumber" (see cucumber). Dried and excavated, the shell was used as a scoop or dipper.


A brief note on the botanical history of the worldwide spread of gourds reveals that the transmission of important plant species occurred much earlier than can be documented by historical records, and is even hard to trace through archeological evidence.

L. siceraria or bottle gourd, thought to have originated in southern Africa, was brought to Europe and the Americas very early in history, being found in Peruvian archaeological sites dating from 13,000 to 11,000 BC and Thailand sites from 11,000 to 6,000 BC. A study of bottle gourd DNA published in 2005 suggests that there are two distinct subspecies of bottle gourds, domesticated independently in Africa and Asia, the latter approximately 4,000 years earlier. The gourds found in the Americas appear to have come from the Asian subspecies very early in history, although a new study now indicates Africa. The archaeological and DNA records show it is likely that the gourd was among the first domesticated species, in Asia between 12,000 and 13,000 years before present, and possibly the first domesticated plant species.

A major point of my "Southern Bottle-Gourd (hu-lu) Myths" paper (pp. 189-90) (see "Readings" below) is that there was originally a single origin of the word for "gourd" and that it spread with the plant to the far reaches of the globe already in prehistoric times.  Nothing in my current reflections on the subject would make me want to change my mind on that point.



From John Huehnergard [2/17/20]:

Hebrew qǝṣiʕā is unlikely to refer to Cinnamomum cassia, but rather to a plant found in Ethiopia or Arabia; a recent discussion of the Hebrew word is Benjamin J. Noonan, Non-Semitic Loanwords in the Hebrew Bible (2019) 196-97 (who, along the way, refutes a proposed Chinese etymology for qǝṣiʕā). As Noonan also rightly notes, the word is not found in Aramaic apart from Jewish sources referring to the Hebrew word, so it's not a real Aramaic word.

As for calabash, neither OED nor Amer-Heritage give a Semitic etymology (and my Arabic dictionaries, at least, don't offer the phrase qarʕa yabisa).

Still, interesting stuff.




  1. AntC said,

    February 2, 2020 @ 6:45 am

    Latin cucurbita seems to be the origin also of French courge(tte).

    Late Latin cucutia also of unknown provenance is the source for zucchini. [source: etymonline]

    'Gourd' seems to have a well-travelled root, like aubergine/melanzana.

  2. Keith said,

    February 2, 2020 @ 3:19 pm

    being found in Peruvian archaeological sites dating from 13,000 to 11,000 BC and Thailand sites from 11,000 to 6,000 BC

    So it looks like the fruit in question was in Asia and in the Americas well before even pre-modern times… unless we're looking for a Neanderthal entomology for the word, this looks like a dead-end.

    Likewise, anything to do with recent Latin derived words like the French "courge" (and diminutive "courgette", that in the US is known as "zucchini", even in the singular) seems to me to be pointless.

    I remember seeing the etymology of "carboy" when I first met these in the chemistry lab at school, and wiktionary tells us that the similar word "carafe" is "probably from Arabic غُرْفَة‎ (ḡurfa, “cup or dipper”), from غَرَفَ‎ (ḡarafa, “to ladle”)".

  3. AntC said,

    February 3, 2020 @ 4:04 am

    Keith I'm not seeing why "pointless": it's non-obvious to me that 'courge' and 'zucch-' could be from the same source.

    And indeed etymonline gives two distinct words dated differently in Latin, and both thought to be from non-Italic/non-IE sources. Neither does it posit the two words are related: you'd think it would. So how did they get into Latin, and why aren't there cognates in other (non-Romance) IE languages? Or are there?

    I'm not asking to go back to proto-Neanderthal or proto-south-African.

  4. Chris Button said,

    February 3, 2020 @ 8:50 pm

    I think Keith's comment is about the difficulty in ascertaining a clear provenance for the word.

    In terms of an Old Chinese association, the difficulty here (as with the many other cases of loanwords) is with the quality of the reconstructed Old Chinese forms. In my opinion, the field is moving in an unfortunate direction based on faulty principles and forced a priori assumptions, and the correspondences between the reconstructed Old Chinese forms and their putative loanword sources are looking less persuasive as a result.

  5. Victor Mair said,

    February 3, 2020 @ 11:33 pm

    Chris Button has hit the nail on the head. It is precisely because of the faulty premises for Old Sinitic reconstructions that I have embarked upon this long-running series of comparisons with persuasive loanwords.

    Two of the most brilliant historical linguists of Sinitic, Jerry Norman and South Coblin, at the peak of their careers. eschewed attempts to engage in large scale reconstruction of Old Sinitic and instead turned their attention to the history of the topolects. Some of their most talented followers have taken the same path.

    For at least four decades, I have adopted a different strategy. I have striven to identify loanwords for cultural and technological phenomena that came to East Asia from abroad and that are clearly attested in the archeological, historical, genetic, visual, etc. record. In this way, it has been my aim to provide reliable data points upon which to anchor and revise the not so solid reconstructions. In other words, I'm hoping that those who engage in the reconstruction of Old Sinitic sounds will take into account the realia of the physical and historical record and the foreign words attached to them.

  6. Victor Mair said,

    February 9, 2020 @ 2:41 am

    From Diana Shuheng Zhang:

    As for 瓜, we know that it is a 合口 (labio-) sound from the Middle Chinese on. A best shot for OC has to be *kʷ(r)á. -r- is put in parentheses since labiovelars (Kʷ-) and medial -r- usually can't co occur. We know that it has to end in *-a though because of its xiesheng derivations — 孤 *kʷá, 狐 *gʷá (the acute mark denotes type A vowel) [Note 1], etc.

    The reason that we nevertheless find much co-occurrences of *Kʷ- and *-r- in Baxter-Sagart '14 reconstructions is that they simply put an *-r- at every place where there could possibly be an *-r-, even with the slightest chance, in order to foreground this pathbreaking finding that features their new system. But though we might not wholly deny the existence of *-r- in the OC inventory, such an overestimating move may lead to confusion and in turn undermine the veracity/credibility of their *-r-, thus we would better eliminate it in almost all cases. It may also well be that *-r- is just a tool to justify the QYS (Qieyun system) categories that has no basis in phonophorics (for Type B syllables) or rime patterns.

    Now, 葫蘆. If this word appears in the Western Han or later, it could be a binome for 瓜:

    葫蘆 *gʷá-*rá > gɔ-rɔ

    that occurred at sometime just a bit after Emperor Wu's reign (141-87 BCE), which would then be homophonous with earlier gâ-râ (circa. 50-0 BCE)

    [Note 1: "Fox" 狐 is a main reason of negating the existence of *-r- in the OC 瓜. This is a case of Schuessler's "a archaism" for words the vowels of which didn't become -ɔ due to influence of "rural" topolects that never changed along with the big tide of sound changes. Anyways, since there are no *r- initial words with 瓜 as phonophoric, 瓜 may simply be *kʷá in reality — though *kʷ(r)á is permitted as a form of noting down for the sake of further theorization.]

  7. Victor Mair said,

    February 9, 2020 @ 10:00 am

    I'm pleased that the evidence provided by Diana would seem to support my suggestion that húlu 葫蘆 ("gourd; calabash; cucurbit") may represent the disyllabicization of guā 瓜 ("gourd; melon").

  8. Chris Button said,

    February 9, 2020 @ 3:25 pm

    The uvular *q- onset in 瓜 *qráɣ accounts for the rounding in Middle Chinese. The situation is the same in 桂 *qájs above. Compare the effects of the *-q coda in the Old Chinese -aq rhyme group where Baxter & Sagart have labiovelar -wk (i.e, -kʷ). But they then can't account for the occasional unrounded reflexes nor the lack of a labiovelar nasal counterpart (-wŋ or -ŋʷ), while the lack of a uvular nasal counterpart is entirely predictable.

    The -r- in 瓜 *qráɣ is needed for the Middle Chinese vocalism. Baxter & Sagart would have put it in parentheses if it had been possible to leave it out in their system. There are cases where the vocalism conditioned by medial -r- occurred without any actual medial -r-, but Baxter & Sagart don't recognize those. We discussed one such example on LLog a while ago with the case of 荼 and 茶. Both should be reconstructed as Old Chinese *láɣ, but the latter develops as if a medial -r- had been present (and so Baxter & Sagart arbitrarily reconstruct it with one). In fact, it seems that there was sporadic lengthening of *láɣ to *láːɣ in 茶conditioned by it effectively being an open syllable in terms of surface phonetics (but not underlying phonology). An account of the parallel evolution of medial -r- with vowel length is found in Pulleyblank's OC glottals paper (1995).

  9. Chris Button said,

    February 9, 2020 @ 4:03 pm

    Incidentally, one thing I've never understood is why people take issue with the many cases of medial -r- (notwithstanding cases like 茶 that have been reconstructed erroneously with it). Excluding -j- and -w- (which are another matter to do with the phonemically misled but phonetically broadly justifiable "front/rounded vowel hypothesis"), OC clusters were unsurprisingly confined to the liquids -r- and -l- (i.e., plan, pran but not ptan, pkan). The rhotic -r- just had a really overt impact in terms of Middle Chinese vocalism, while -l- is often not properly identified.

    Fortunately, all is not lost with -l- since we do have Schuessler's reconstruction of *kʰl- in cases like 出 (far more likely than proposals such as Baxter & Sagart's *t-kʰ-, in which the *t- prefix seems entirely ad hoc to me). This can be extended to non-aspirated cases too. A nice example is 織, commonly reconstructed with *t-, but whose *kl- cluster is supported by its phonetic series (ultimately going back to 弋 with *l-) and its Tibeto-Burman counterparts with liquid onsets.

RSS feed for comments on this post