Morphemes without Sinographs

Commenting on "Educated (and not so educated) guesses about how to read Sinographs" (11/16/21), Chris Button asked:

I’m curious what you mean by “pseudo explanation”? The expected reflex from Middle Chinese times is xù, but yǔ has become the accepted pronunciation based on people guessing at the pronunciation in more recent times. Isn’t that a reasonable explanation?

To which I replied:

It's such a gigantic can of worms that I'm prompted to write a separate post on this mentality. I'll probably do so within a few days, and it will be called something like "Morphemes without characters".

Stay tuned.

And here's the promised explanation:

It's all the guessing business that the previous post was about.  Imprecision.

It's similar to the běnzì lǐlùn 本字理論 ("native / original character theory") that I have struggled against for decades.

What is this běnzì lǐlùn 本字理論 ("native / original character theory")?  It is the ardent belief that for every Sinitic morpheme there is a corresponding, and, in the minds of many proponents of this theory, a preexisting, Sinograph.

There are countless morphemes in the host of Sinitic languages and topolects for which there are no known characters.  I have written about this phenomenon scores of times on Language Log (see the "Selected readings" below for some examples).  For the last four thousand years and more, innumerable morphemes have arisen and entered the Sinitic lexicon.  Often we have no idea where these new morphemes came from, and frequently they come from non-Sinitic languages.  Such being the case, how could there possibly be a preexisting Chinese character for them?  There simply is no "běnzì 本字 ("native / original character") for each and every morpheme in Sinitic.  Quite the contrary, morphemes come first, and characters are devised to write them.  In other words, in terms of the evolution / sequence of morphemes vs. graphemes, the former are preexisting and the latter are secondary.

When people notice that there is an unwritten / unwritable morpheme floating around in the verbal lexicon and they decide it's something they want to write down, they cannot just transcribe the sounds of the new morpheme (or word) as is done with languages that use a phonetic script.  Rather, they either have to invent a completely new character or borrow another character that has the same sound as the target, characterless / benziless morpheme.

In a way, the běnzì lǐlùn 本字理論 ("native / original character theory") under discussion here is the reverse of the educated guessing theory brought forward at the beginning of this post and in the previous post to which it alludes.  Namely, there you have a character, don't know how to pronounce it, and often are not sure what it means, so you make a more or less "educated" guess how to pronounce this unknown character and what it means.

In both cases, it is wishful thinking.  Such procedures are not at all scientific and should be laughed out of the courts of phonology and orthography.

Nonetheless, all of this talk about běnzì lǐlùn 本字理論 ("native / original character theory") and guessing how to pronounce unknown characters takes me back to some pleasant, prolonged bǐzhàn 筆戰 ("pen / brush battles / wars")  that I had during the 70s and 80s with an old Taiwanese scholar named Wu Shou-li, who was the most eminent authority on Fukienese of that era.  Our polite polemics really were bǐzhàn 筆戰 ("pen / brush battles / wars"), because that was in the days before computers, and we had to write out our respective sides of the debate and send them through the mail.

I was delighted to find this nice article about Professor Wu online:

"The Tongue-Tied Fate of Wu Shou-li", by Chen Kwe-fang, translated by Phil Newell, with photos by Wang Wei-chang Taiwan Panorama (December, 1989).

Professor Wu would say to me, "Professor Mair, I'm sure I can find the běnzì 本字 ('native / original characters') for every word in Fukienese, though I must admit that I haven't found them yet.  So I have to keep looking."  To which I would reply, "I respect your tenacity, Professor Wu, but I believe you could search for the rest of your life and you'll never find the běnzì 本字 for thousands of morphemes in Fukienese".  For example, even such very common ones as chhit-tho ("play"), which borrows 七桃 ("seven peaches") and other outlandish characters to write it.  See also the great dictionary of spoken Amoy by Carstairs Douglas, which has many entries lacking solidly established Sinographic forms.

And we would let it rest at that until the next round.


Selected readings



  1. Andreas Johansson said,

    November 19, 2021 @ 2:00 am

    Is using a preexisting character for a homophonous morpheme really any different from using phonetic spelling to write a normally-unwritten word or morpheme in an alphabetic script?

  2. John Swindle said,

    November 19, 2021 @ 5:03 am

    @Andreas Johansson: Yes, it’s different. Borrowing a Chinese character may be more like borrowing a written word than borrowing alphabet letters. Chinese characters are supposed to have not only a pronunciation but also a meaning or range of meanings. Alphabet letters in general don’t have that. If you could just use any character that sounds the same you’d eventually end up with a nice syllabary and not the glorious confusion that is written Chinese.

  3. ~flow said,

    November 19, 2021 @ 5:22 am

    Looking up 本字 on zdic net (本字) helpfully gives the following:

    本字 běnzì [the original form of a character, as opposed to its present form] 一个字通行的写法与原来的写法不同,原来的写法就称为本字,如“燃”的本字是“然”

    This is the meaning of 本字 that I've been accustomed with from its use in dictionaries. This is quite different from the usage discussed in your article, isn't it, it basically just says, "this is the form that was intended by its creators, before later generations altered it". There's no claim that all morphemes must have a 本字, or that the 本字 is the more correct way to write a given morpheme. There *is* a claim that we can meaningfully 'translate' or transform older character styles as seen on oracle bones and bronze vessels and seals into equivalent 楷書 forms. The Kangxi dictionary is of course full of these "what-if" shapes of which many were presumably never used in actual text, basically telling you that "if this character had only undergone systematic stylistic changes, then this would be the way one would write it today". I will say that I cannot for certain keep this concept of 本字 apart from 古文 (other than the latter being specifically used to indicate 'pre-Qin-unification form'), but that is no different from many of the labels for variants that one finds in dictionaries (俗, 正, 同, 通 etcpp) the concrete application of which is apparently a bit of a matter of taste.

    Is there a paper or other source that establishes the use of 本字 in the sense of "the […] belief that for every Sinitic morpheme there is a corresponding, and, in the minds of many proponents of this theory, a preexisting, Sinograph"?

  4. Jonathan Smith said,

    November 19, 2021 @ 5:53 am

    See e.g. at

    "本字: 相對於「訓讀字」、「借音字」、「新造字」而言。指在傳統文獻中即有漢字字形,且意義與閩南語詞彙具同源關係的用字。例:表示「香」的phang本字為【芳】、表示「小」的sè本字為【細】。"

    The definition you cite doesn't reference semantics which are crucial…. the example given there really means that the word in question ('burn') was first written "然"; it is this idea which expands to that expressed just above. (Both quotes obviously face the problem that it is hard to examine the issues objectively/detachedly when one's medium is itself Chinese characters…)

  5. Twill said,

    November 19, 2021 @ 6:41 am

    Surely if, as it appears in several cases, the most parsimonious and plausible explanation for a reading of a character is that it was matched to an identifiable phonetic series of a supposed phonophore it contains, it cannot be simply dismissed as unscientific. I appreciate that English and Chinese are worlds apart, especially script-wise, but if we can validly infer that e.g. the dominant pronunciation of "forehead" is based in orthography, I don't see why we would exclude that from consideration a priori for Chinese.

  6. Chris Button said,

    November 19, 2021 @ 7:02 am

    I’m confused by the suggestion of any association with 本字 .

    We have a character 嶼 that should logically still be pronounced in MSM as xù (a pronunciation noted in dictionaries), but tends to be pronounced nowadays as yǔ. How is that any different from “niche” in English, which more and more Americans (not sure about English speakers elsewhere) seem to be rhyming with “itch” nowadays instead of “quiche”.

  7. ~flow said,

    November 19, 2021 @ 7:25 am

    So if I understand correctly 表示「小」的sè本字為【細】 seems to tell me that the word/morpheme sè means 'small' in present-day 閩南語 which is best glossed as '小', however it is historically derived from the same root that in modern Mandarin is written '細', which happens to mean 'slender, thin', so there's a conflict if one wants to write sè with characters: '小' better expresses the current meaning, but '細' is historically more correct.

    Now, the linked test states that 對「本字」的認定則差異不大,學者們的考證也在逐漸累積可信的本字。但閩南語中仍有許多來源不明的字、詞,根本非源於漢語,造成「有音無字」的狀況 […], so while asserting that there are 本字 in the above sense that scholars have been able to identify, there are also those "characters and words of unclear origin [in Minnanhua] that do not come from Sinitic (漢語), resulting in 'sounds with no way to write them'". A close reading of this passage would seem to suggest that all Sinitic morphemes have a 本字 but one can also understand it as implying that "if Sinitic, then 本字, if not Sinitic, then—naturally—no 本字" . This begs the questions whether (1) 葡萄 count as the 本字 of 'grape', a word of foreign origin, and whether 足 counts as the 本字 of both the morpheme meaning 'foot, leg' and the other one meaning 'complete, enough'.

  8. Victor Mair said,

    November 19, 2021 @ 8:38 am

    @John Swindle

    Thank you very much for clarifying the difference between borrowing a Chinese character and borrowing alphabet letters.


    I was aware of the standard dictionary definition of běnzì 本字, which you have given a good account of. My usage of běnzì lǐlùn 本字理論 ("native / original character theory") in this post and elsewhere with regard to morphemes that are lacking well established characters is the product of long correspondence and face-to-face discussions with Wu Shou-li and other proponents of the view that it espouses.

    @Jonathan Smith

    Thank you for the extremely helpful, informed, and insightful remarks on the nature of běnzì 本字 in the context of the present debate.


    Characters are not always borrowed with due attention to "identifiable phonetic series".

    @Chris Buttom

    It's the methodology, approach, attitude, mentality… that are similar in the two cases.


    Such are the complexities and vagaries of the issues with which we are dealing.

  9. Andreas Johansson said,

    November 19, 2021 @ 9:44 am

    @John Swindle:

    I think you may have misunderstood me. Alphabetic letters don't generally carry meanings, it's true, but I was speaking about words or morphemes.

    Consider, say, English "pane" and "pain". They're homophonous* but not interchangeable, with each spelling indicating a different meaning. Yet, if I needed to write some normally-unwritten dialect word that happened to share the pronunciation /pein/, I could use either spelling to indicate this pronunciation**.

    Now, such heterographic homophones are not found in all alphabetic orthographies, but they're very common, especially in long-established written traditions.

    * With a few dialectal exceptions.

    ** I could also use a normally-unused spelling like "playn", which I guess has no analogue in Chinese?

  10. Andreas Johansson said,

    November 19, 2021 @ 9:58 am

    … where "playn" of course was supposed to be "payn".

  11. Michael Carasik said,

    November 19, 2021 @ 3:04 pm

    Wondering whether you've ever chatted with the people over at the museum about the similarities and differences between reading Chinese characters and reading cuneiform? I for one would be interested in this comparison.

  12. Thomas Rees said,

    November 19, 2021 @ 4:43 pm

    @Andreas Johansson:

    “Payn” exists; it’s the surname of Noël Coward’s lover Graham. And there are Sinographs that are rarely used except as surnames.

  13. David Marjanović said,

    November 19, 2021 @ 7:09 pm

    Cuneiform is a very Japanese-like mix of characters that carry meanings and stand for word stems and characters that (more or less) don't carry meanings and stand for syllables.

  14. John Swindle said,

    November 19, 2021 @ 7:47 pm

    @Andreas Johansson: Sorry if I misunderstood you! I think the fact that individual Chinese characters are considered to have meaning (whether in writing a particular word they do have identifiable meaning or not) makes the two cases different in the way I described, but I can see that there are parallels.

    @Victor Mair: You're welcome.

    So the newly coined word běn zì 'original character' is written with the existing Chinese characters "本字", originally used to write a word of the same pronunciation and related meaning. Yes, I'm joking.

  15. Victor Mair said,

    November 19, 2021 @ 10:01 pm

    From Chau Wu:

    I love this post and the comments, Professor Mair! Taiwanese vocabulary is rich in morphemes without Sinographs, the so-called "Yǒu yīn wú zì 有音無字" group of words.

    I have a 2-volume set of books by the late Professor Shou-li Wu: Mǐn Tái fāngyán yánjiū jí 閩台方言研究集 (1) and (2).

    His son, Jau-Shin Wu 吳昭新醫師 (MD, PhD), a specialist in viral hepatitis, has set up a Website for the senior Wu: Táiyǔ tiāndì 台語天地 (World of Taiwanese) is still running –

  16. Levsha said,

    November 20, 2021 @ 8:21 am

    folk sinography

  17. Not a naive speaker said,

    November 20, 2021 @ 3:02 pm

    @John Swindle:

    … not the glorious confusion that is written Chinese.

    I like this definition

  18. KIRINPUTRA said,

    November 24, 2021 @ 9:55 am

    The benzi racket is essentially a set of religious beliefs about the history. We'll "struggle" against it for the rest of our lives, unless we outlive the Republics of China.

    "Rather, they either have to invent a completely new character or borrow another character that has the same sound as the target, characterless / benziless morpheme."

    Or borrow a Sinograph by meaning, as widely done for languages that are etymologically & geographically distant from Middle China. The benzinauts hate meaning borrowings, esp. ones that are obviously meaning borrowings.

