The politico-cultural implications of Taiwanese romanization

« previous post | next post »

Which do you think is harder — reading and writing Taiwanese with characters (sinographs) or with romanization?

I maintain — and I have tried to show over the years — that it's much easier to read Taiwanese written with roman letters than with Chinese characters.  The same is true of all vernacular Sinitic languages.

It is relatively easy for a speaker of Taiwanese to become literate in roman letters, not at all so in characters.  See the posts under "Selected readings" below.

There have been many efforts to write Hokkien / Hoklo / Taiwanese dictionaries.  Still today, I think that the monumental Chinese-English dictionary of the vernacular or spoken language of Amoy by Carstairs Douglas (1830-1877) was the best. 

The problem is that all lexicographers of vernacular Sinitic topolectal dictionaries who attempt to include sinographs get tripped up by them.  In truth, there are many morphemes in these languages that lack verifiable sinographic equivalents, yet traditional Chinese scholars insist that all morphemes in "Sinitic" "dialects" (which is a sick / sad joke, because dialects are supposed to be mutually intelligible, whereas — as we have recently seen in a life and death trial — not even Taishanese and Cantonese, much less Cantonese and Mandarin can be proven to be "dialects" of a single language, even though that is a dogma subscribed to by countless legions) ineluctably are connected to a specific sinograph.

This is a barrier that the traditionalists pretend to get around by a sneaky subterfuge they call běnzì lǐlùn 本字理論 ("native / original character theory") that I have struggled against for decades.

What is this běnzì lǐlùn 本字理論 ("native / original character theory")?  It is the ardent belief that for every Sinitic morpheme there is a corresponding, and, in the minds of many proponents of this theory, a preexisting, Sinograph.

There are countless morphemes in the host of Sinitic languages and topolects for which there are no known characters.  I have written about this phenomenon scores of times on Language Log (see the "Selected readings" below for some examples).  For the last four thousand years and more, innumerable morphemes have arisen and entered the Sinitic lexicon.  Often we have no idea where these new morphemes came from, and frequently they come from non-Sinitic languages.  Such being the case, how could there possibly be a preexisting Chinese character for them?  There simply is no "běnzì 本字 ("native / original character") for each and every morpheme in Sinitic.  Quite the contrary, morphemes come first, and characters are devised to write them.  In other words, in terms of the evolution / sequence of morphemes vs. graphemes, the former are preexisting and the latter are secondary.

When people notice that there is an unwritten / unwritable morpheme floating around in the verbal lexicon and they decide it's something they want to write down, they cannot just transcribe the sounds of the new morpheme (or word) as is done with languages that use a phonetic script.  Rather, they either have to invent a completely new character or borrow another character that has the same or similar sound as the target, characterless / benziless morpheme.

In a way, the běnzì lǐlùn 本字理論 ("native / original character theory") under discussion here is the reverse of the educated guessing game where you have a character you don't know how to pronounce and often are not sure what it means, so you make a more or less "educated" guess how to pronounce this unknown character and what it means.

In both cases, it is wishful thinking.  Such procedures are not at all scientific and should be laughed out of the courts (!) of phonology and orthography.

Nonetheless, all of this talk about běnzì lǐlùn 本字理論 ("native / original character theory") and guessing how to pronounce unknown characters takes me back to some pleasant, prolonged bǐzhàn 筆戰 ("pen / brush battles / wars")  that I had during the 70s and 80s with an old Taiwanese scholar named Wu Shou-li, who was the most eminent authority on Fukienese of that era.  Our polite polemics really were bǐzhàn 筆戰 ("pen / brush battles / wars"), because that was in the days before computers, and we had to write out our respective sides of the debate and send them through the mail.

I was delighted to find this nice article about Professor Wu online:

"The Tongue-Tied Fate of Wu Shou-li", by Chen Kwe-fang, translated by Phil Newell, with photos by Wang Wei-chang Taiwan Panorama (December, 1989). [may no longer be available]

Professor Wu would say to me, "Professor Mair, I'm sure I can find the běnzì 本字 ('native / original characters') for every word in Fukienese, though I must admit that I haven't found them yet.  So I have to keep looking."  To which I would reply, "I respect your tenacity, Professor Wu, but I believe you could search for the rest of your life and you'll never find the běnzì 本字 for thousands of morphemes in Fukienese".  For example, even such very common ones as chhit-tho ("play"), which borrows 七桃 ("seven peaches") and other outlandish characters to write it.  See also the great dictionary of spoken Amoy by Carstairs Douglas, which has many entries lacking solidly established Sinographic forms.

And we would let it rest at that until the next round.

[This section has been adapted from parts of the first post in the list of "Selected readings" below.]

When Carstairs Douglas published his monumental dictionary of Amoy vernacular in 1873, there was not a single character in it. In 1923, Thomas Barclay (1849-1935) published from the Commercial Press in Shanghai a Supplement to Douglas's dictionary. Although Barclay added characters for many of the entries, he still left numerous entries without any characters assigned to them. For the completely new entries added by Barclay, most lacked characters. This is in sharp contrast to a bizarre dictionary compiled by the Department of Sinitic Topolects in the Institute for Chinese Languages and Script of Amoy University and published to great fanfare in 1982. All of the entries have characters and MSM pronunciations, by which they are ordered under head characters. All definitions are given in MSM and, indeed, the MSM elements of the dictionary are openly based on the well-known Xiandai Hanyu cidian (Dictionary of Modern Sinitic [i.e., Mandarin]). A sizable portion of the entries in this dictionary from Amoy (MSM Xiamen) University are not really authentic Southern Min terms at all, but are simply Mandarin words with Southern Min pronunciations added to them.

[This section has been adapted from a paragraph of the last post in the list of "Selected readings" below.]

Understandably, exponents of Taiwanese literacy have been eager to create a user-friendly, comprehensive dictionary for their language, but their proposals inevitably fall afoul of each other, as evidenced in this piece by Chén Cún 陳存:

Guāndiǎn tóushū: Qiǎnzé “Táiwān tái yǔ chángyòng cí cídiǎn' chōngmǎn miùwù

觀點投書:譴責《台灣台語常用詞辭典》充滿謬誤

"Opinion Letter: Condemning the "Taiwanese Common Words Dictionary" for being full of fallacies"

Storm Media (1/3/25)

A bugbear or stumbling block or whatever you want to call it that stands in the way of those who want to create a good dictionary of Taiwanese (or other non-standard, topolectal variety of Sinitic is what to do with the characters.  Tracing backward in time to review the best dictionaries of Taiwanese, I find that, so far as I can tell, they are all arranged alphabetically.  Here are two from the 1970s:

 
A Dictionary of Southern Min. By Bernard L. M. Embree,
Hong Kong: Hong Kong Language Institute, 1973. Pp. xlvi +
305.
 
Amoy-English Dictionary. By Maryknoll Fathers. Taichung,
Taiwan: Maryknoll Language Service Center, 1976. Pp. iv +
946.
 
They were favorably reviewed by Cornelius C. Kubler in Journal of Chinese Linguistics, 7 (1979), 120-124.  I consider myself fortunate to have had access to these two dictionaries near the beginning of my career.
 
I was particularly interested in the one by Bernard Embree (1923—1997) because he has the same surname and lived around the same time as Ainslie Embree (1921-2017), who was also Canadian, and distinguished professor of Indian history at Columbia University from 1958-1991.  I cannot be certain that they were brothers, but I suspect that they relatives.

In any event, Bernard Embree's dictionary had many virtues, including the fact that he checked all his entries against those of the stellar works of Douglas and Barclay mentioned above, as well as that of Ernest Tipson, Chinese-English pocket dictionary,of the Amoy vernacular. Taichung: Maryknoll House, 1935.

Now it is past time for the next dictionary of Taiwanese, and, as we have seen above, there are squabbles.  I asked a number of colleagues I know who are in the midst of compiling their candidate for this generation's new dictionary of Taiwanese.their opinion about the dispute.  One for whose work I have high regard is A'ióng, who replied to me thus:

As you probably know, I am generally uninterested in digging too deeply into the often mythological Kanji origin stories of various words. That's 20,000 landmine-filled rabbit holes I prefer to avoid. I normally write in Lomaji and ignore all such discussions for the sake of my sanity (and hopefully making the world a better place). But if I need to use a Kanji for some reason I typically follow whatever the folks at TÂI-JĪ CHHÂN 台字田 have come up with, which in the case of pō͘-pîn is 歩屏, purely a sound loan as is typical in pre-ROC era Taiwanese writing.
 
That being said, I agree that the quality of work done by the folks at ROC MOE [Republic of China Ministry of Education — educational bureaucrats] is generally quite poor. You can tell by the fact that they're now calling it “Taiwanese Taiwanese” (台灣台語). As if there were any other.

As I believe I've mentioned before, my brother's Taiwanese mother-in-law had been completely illiterate for the first half of her life, but after that had the good fortune to take classes in POJ.

Pe̍h-ōe-jī (/pwˈ/ pay-way-JEE; Taiwanese Hokkien: 白話字, pronounced [pe˩ˀ o̯e̞˩ d͡ʑi˧] , lit.'vernacular writing'; POJ), also known as Church Romanization, is an orthography used to write variants of Hokkien Southern Min, particularly Taiwanese and Amoy Hokkien, and it is widely employed as one of the writing systems for Southern Min. During its peak, it had hundreds of thousands of readers.

For the second half of her life, she was literate.

Mutatis mutandis, the same is true for all other topolects and languages in China.  Having their own written language will enable the speakers of a non-Mandarin or non-Sinitic language to write their own literature and preserve / disseminate their own culture.

My own mother was a Christian, and she often spoke about "false belief".  Respecting her, it is my opinion that the dogma that all morphemes of Sinitic topolects and languages come with a predestined běnzì 本字 ("original character") is a "false belief".

That's as much as I want to say today. I think that Kirinputra will take up the thread of the search back in time beyond the advent of the ROC on Taiwan.  I should mention that the promotion or promulgation of Taiwanese was illegal under the KMT in the early decades of the ROC.  I knew people who were imprisoned for doing so.

Afterword:  A tale of two tails

Yesterday as I was walking on the street with one of my students, she used the expression "èryǐzi".  I told her I didn't understand that term.  She said it means "a person with two tails".  Hmmm.  I said, "Oh, you mean 'yīgè rén yǒu liǎng ge wěibā 一个人有两个尾巴' ('a person with two tails')?"  "Yes", she replied, "but we don't pronounce '尾巴' as  'wěibā', we pronounce it as 'yǐba'."  (Well, I [VHM] have been pronouncing "尾巴" as  "wěibā" for more than half a century, but I didn't make a point of it with her.)  She continued, "That's why we pronounce this term as 'èryǐzi'."  After a while, together we figured out that she was trying to say a Chinese equivalent of "hermaphrodite", but it's derogatory (also means "sissy") and regional (north central and slightly to the west).  I'm simplifying things quite a bit, because it's late and I have to catch a flight tomorrow morning.  Of course, Chinese also has more neutral, anatomical ways to say "hermaphrodite; intersex person".  The reason I bring up this "Tale of two tails" here is because it illustrates how the same character can confuse people more than help them when they're talking about something and bring up the characters.  And all of this was done using different topolects of "Mandarin".

BTW, this odd regionalism, "a person with two tails" is also a part of the vocabulary of Dungan speakers, who use cyrillic to write their language, not sinographs, hence эрйизы.  For Dungan, see this post, "'Thanks' in Hakka and other Sinitic topolects" (2/15/25), and many other Language Log posts, some of them listed in the "Selected readings" below.

This sort of thing happens countless times when you read, write, and speak Chinese languages and topolects.  I sort of get used to it.

 

Selected readings



28 Comments »

  1. Chester Draws said,

    March 12, 2025 @ 11:02 pm

    Over the centuries there have been many languages that have moved from characters towards alphabets. Has there been a single case of an alphabetic language voluntarily moving towards characters? The modern emoji is a bit of an example, but that's all I could think of.

    The case for alphabets over characters is over-whelming.

  2. Chas Belov said,

    March 13, 2025 @ 1:14 am

    Hope you made your flight. When you get a chance:

    Regarding two tails, ¿how does wěibā 尾巴 for hermaphrodite differ from cíxióngtóngtǐ 雌雄同體 for hermaphrodite, or are they interchangeable?

  3. Peter Cyrus said,

    March 13, 2025 @ 6:25 am

    May I make a suggestion? Instead of promoting romanization, please consider advocating the use of a better alphabet. The Roman alphabet doesn't have letters for many of the sounds needed, it doesn't show the division into syllables as well as characters, and it seems like an admission that Western civilization is superior.

    You could take a look at musa.bet/zh for an example of what's feasible.

  4. John Rohsenow said,

    March 13, 2025 @ 6:31 am

    Although I use the word "wěibā" for 'tail',I have always sung the children's song, that I suppose most foreign language learners of Mandarin learn, as:
    Liáng zhì lǎohǔ, liáng zhì lǎohǔ; pǎo dé kuài, pǎo dé kuài;
    yīzhī méiyǒu ěrduǒ, yīzhī méiyǒu *yǐba*; zhēn qíguài, zhēn qíguài.

    [Two tigers, two tigers, running very fast, running very fast;
    one has no ears, one has no tail; very strange, very strange!]

    w/out thinking very much what the characters for 'yǐba' were/are.

  5. Victor Mair said,

    March 13, 2025 @ 6:59 am

    @Peter Cyrus:

    Most of the alphabets of the world don't have letters that match the sounds of the languages for which they are used one-for-one.

    In terms of grammar, syntax, lexicon, etc., it's more important to join syllables into words and other meaningful units than to split them up.

    The alphabet is not "an admission that Western civilization is superior." For starters, think of all the "bahasa" of Southeast Asia. "Bahasa and the concept of 'National Language'" (3/14/13)

  6. Victor Mair said,

    March 13, 2025 @ 7:02 am

    @John Rohsenow:

    Very nice! I learned that children's song too, and I loved it.

  7. Victor Mair said,

    March 13, 2025 @ 7:03 am

    @Chas Below:

    Don't use "èryǐzi" among polite company.

    Heading for the airport now.

  8. Jonathan Smith said,

    March 13, 2025 @ 7:35 am

    "Write Taiwanese (or whatever language) in characters" is not at all the same thing as "try to find and employ so-called běn​zì for every morpheme." It wouldn't matter how or how much the characters used to write say Taiwanese overlapped with those used to write say Mandarin or whatever else. A Taiwanese orthography is/would be a system unto itself where the only relationships that mattered would be those with words of Taiwanese.

    There is no such thing as "verifiable sinographic equivalents" for the words of any language anywhere; this phrase doesn't make sense.

    "Best" system to use would be "whatever they've been using" i.e. maintain convention. So e.g. to the extent Lô-má-jī was or remains conventional it would have been or would be nice to retain it. Same for the Koa-á-chheh character orthography. But if the conventions are dead or nearly so then you probably don't have a practical argument in these terms, just an emotional one — which would not be meaningless.

    But again the question of orthography is neither here nor there absent a viable community of language users to sustain it. This is 99% of the problem wrt Taiwanese and all others except Cantonese (for now at least), where (at least when I was last looking over their shoulders) people write in a character-based informally-conventionalized system because they need/want to and generally without engaging in endless ideological debates about it; crazy how that works.

  9. J.W. Brewer said,

    March 13, 2025 @ 8:32 am

    The Roman alphabet isn't even a good fit for English (or for several modern Romance languages). Reject the false dichotomy that it's kanji or romaji with no other options! Revive the extended version of katakana that was devised for writing Taiwanese phonemically almost a century ago! Or modify bopomofo to the same end. Stop catering to tourists. (The Presbyterians can of course continue to do as they like.)

    If you really want a script of non-Asian origin, maybe Cyrillic-as-modified-for-Dungan can be readily further-modified for other Sinitic languages?

  10. Jonathan Smith said,

    March 13, 2025 @ 8:55 am

    @J.W. Brewer Such a Bopomofo-based system was devised 80ish years ago see wikipedia article

    A problem in this and parallel cases is cool new ideas; see esp. Kirinputra's classic Confessions of an Ex-Hokkien Creationist or xkcd on (would-be) standards. People-power (or OK fascist power) can just plow over these things but when too few people care too much, you just go round and round in dumb circles.

    Notable is that activists see e.g. the government-promoted systems as malicious. But "never attribute to malice that which is adequately explained by stupidity," replacing Hanlon's "stupidity" (which is mean) with "enthusiasm for one's cool new idea!!!"

  11. Mark Young said,

    March 13, 2025 @ 8:57 am

    @Victor: Sorry to disappoint you.

    Bernard and Ainslie Embree were not brothers or first cousins on their fathers' sides. Ainslie's parents and grandparents were all born in Nova Scotia, while Bernard's father and paternal grandparents were born in the USA. Following Bernard's male line back leads from Washington to Missouri to Kentucky, none of which were significant sources of migration to or from Nova Scotia. Thus even second or third cousin is unlikely.

    There is a (very) faint chance that Bernard and Ainslie were related on their mothers' sides, as Bernard's mother and her parents were from Nova Scotia. In that case the Embree name would just be a coincidence.

  12. Victor Mair said,

    March 13, 2025 @ 10:56 am

    @Mark Young:

    Thank you very much for that diligent research on the Embrees! I really appreciate it.

  13. David Marjanović said,

    March 13, 2025 @ 9:40 am

    You could take a look at musa.bet/zh for an example of what's feasible.

    Have you developed a version that can be written by hand? Drawing all the little similar strokes of the printed version by hand looks like it would take forever. With Chinese characters, the fixed stroke order and the very small number of stroke types makes it possible to decipher some very fluent handwriting; but I can't see how that would work with your invention.

    Revive the extended version of katakana that was devised for writing Taiwanese phonemically almost a century ago! Or modify bopomofo to the same end.

    Or Hangeul.

  14. A'ióng said,

    March 13, 2025 @ 10:14 am

    Well as I already told the Professor, I don't care for ideological debate about Kanji usage. I also find it hilarious that people interested in languages (supposedly) don't understand that alphabets have diddly squat to do with bugger all. You can create a mapping between sounds and chicken scratch however you damn well please and it'll work just as well as any other halfway well thought out such system. (In truth of course, many are not interested in language as much as they are in antiwestern politics, but I digress.)

    That being said, I would like to point out that I do understand the shortcomings of romanization as well, and that my own preference is due mainly to pragmatic (and partly to historical) considerations, not any ideological bent in favor of my native script. To drive this point home, note that for years now I have said, insofar as such a thing is possible, the “objectively best” script for Taiwanese is Hangul. There is even an Android app that uses Hangul for inputting Taiwanese. You can type on a Hangul keyboard and output Kanji or — get this — romanization (how pragmatic)! In fact, it wouldn't really be too much of a stretch to say that Hangul was designed at the outset *specifically* for Taiwanese, or a language sufficiently similar that it may as well have been. So why wouldn't Taiwanese people ever adopt it? Easy: they don't think it's “Sinitic” enough. Nevermind that *gugeo* and *kokgü* (if the kokgü were Taiwanese) have some ridiculously high percentage of ~75% overlap in word etymologies. ‍♂️

  15. KIRINPUTRA said,

    March 14, 2025 @ 4:25 am

    Victor, you make many good points, and some puzzling ones…. What do you believe to be true? What’s your interpretation of the facts?

    Regarding the Neo-Sinological pseudo-etymology: You’re not drinking the Kool-Aid yourself, but…. On a motorbike, staring at an object to “make sure you don’t hit it” is not the best way to not hit it. Strangely enough, fixating on something for any reason multiplies the likelihood that you’ll hit it. This is like that.

    Some comments, sequenced to minimise cognitive load, hopefully.

    — The romagraph is easier to LEARN than the sinograph as we know it. This is true with ANY language, incl. Japanese. Learning aside, it’s not clear that any writing system is easier to use than any other.

    — What’s “Sinitic” got to do with it? What IS “Sinitic” anyway? What is the basis of this concept that keeps popping up in (y)our analysis?

    — Literacy in romanised Taioanese is relatively easy to acquire. (Well, kids make quick work of sinographs too….) But FLUENT literacy is HARD to acquire as an adult.

    — In practical terms, for ex., from constant experience: Messages (esp. @ a group or list of people) in romanised Taioanese are often ineffective b/c so many recipients — being chronically slow readers — only skim, or put the msg off for later and forget. (In the long run, of c., people wouldn’t be learning as adults. But ease is a false promise in the context of adult learning.)

    — The Japanese script is clearly harder to learn than the Vietnamese romanisation. But does this mean anything for anybody? There is nothing up that tree.

    — There should be no assumption that romagraphic & sinographic Taioanese are either-or. (To me it makes sense to face that possible outcome, though, to eliminate it unless desired by all.)

    — Douglas (Hokkien-English) is great. So is Ogawa (Taioanese-Japanese, but arguably also Amoy-Japanese). So is Tiuⁿ Jū Hông’s 白話小詞典 (Taioanese-Mandarin). These three are indispensable.

    — Embree is underrated, and very good. Among other things, it reliably reflects actual romagraphic usage. (This might need explaining — some other time.) And it’s more or less modern, unlike Ogawa (or Douglas).

    — I don’t think I’d ever heard of the Tipson dictionary. You learn something every day.

    — The Maryknoll dictionaries are not good for most purposes. A lot of the entries “are simply Mandarin words with Southern Min pronunciations added to them”, I guess reflecting how modern Formosan Catholics worship. Underrepresented are basic verbs, words with no clear Mandarin cognate (incl. high-use function words), etc. (The Maryknoll TAIWANESE textbooks, made by the same two gentlemen, are really good.)

    — Tiuⁿ Jū Hông’s 白話小詞典 is state-of-the-art, for the most part. This book is systematically ignored in some circles.

    — Douglas DOES get into the “etymological sinograph” guessing game, but subtly. To save time & expense, he withheld the actual sinographs, and only identified them via Han reading. Like Tiuⁿ Jū Hông (and Ogawa, but less so), Douglas didn’t force guesses, and most — prob. over 90% — of his assignments are sound AFAIK.

    — You’re right that Original Character Theory is wishful and unscientific. But it’s not traditional.

    — In a Hoklo context: It has roots in the 19th cen., in the currents that gave rise to Chinese (Neo-Chinese) nationalism. We see what looks like embryonic Orig. Character Theory in Douglas, and inchoate elements in the early 19th-cen. SI̍P-NGÓ͘ IM 十五音. But the endeavor seems to have really got under way in the 1910s in the Hokkien context, and after about 1925 in Formosa.

    — Original Characters were & are a learned (士) preoccupation. (The learned class has effectively expanded, though.)

    — TRADITIONALLY, speech was just air, not worth a thought, esp. a learned man’s thought. Theories of nationalism — and the danger of being colonised — forced the Chinese “brahmin” to think about speech, since a national Sprache or some similar arrangement is needed for nation-building. (So the Indian armed forces operate in Hindi; the Canadian armed forces are bilingual, and require bilingual fluency in officers over a certain grade.)

    — Intellectually, this newfound interest in speech intersected with the self-conscious (via the Western gaze?), insecure (via the Manchu ruling class?) neo-idea that sinographs — mystic, ancestral, and infinitely multiplied — were what made China essentially China among the nations of the world. And Original Character Theory was born, quietly.

    (Once upon a time in China, sinographs weren’t thought of as being “quintessentially Chinese”. They were just writing. Was there any other kind?)

    — On a more spiritual level, traditional ancestor worship and clan concepts had translated to the myth of a nation of millions of co-descendant blood cousins. If Chinese blood sprang from the loins of Godking Yeller, of course Chinese speech & Chinese words would’ve sprung from a common source.

    — Meanwhile, sinographic Hoklo has always been chiefly a thing of the merchant class (incl. show business) and of skilled workers. It is secular.

    — The brahmins dipped into sinographic Hoklo when they needed to write the name of a place, or words for local produce, or the name of a peasant-criminal, etc. These things bore a 土 (thó͘) tag — LOCAL.

    — Just as an older woman cannot wash the underthings of a younger woman in some African cultures, a learned man — a lexicographer, perhaps — cannot fully acknowledge (let alone catalog) the secular, profane usages of merchants & workers.

    — This is why the Formosan literati in the 1930s talked about inventing sinographic Taioanese as if did not already exist. This is also why modern brahmins cynically refer to sinographic Taioanese as “songbook graphs”, to put it in its place. THIS IS WHY no modern Hoklo wordbook has ever straightforwardly documented the existing sinographic usages, as would be expected with a dictionary of most languages that have writing.

    (The pre-modern Spanish-Hokkien DICTIONARIO HISPANICO SINICUM did just that, although some entries seem to have been compromised at some point in the copy reprinted by Fabio Lee et al. Non-十五音 rimebooks of the 1800s also documented vernacular sinographic usage, más o menos — the more “commercial” a rimebook was, seemingly, the more vernacular graphs it would take in. 蔡俊明’s 潮州方言詞匯 — a 1991 Teochew-Mandarin dictionary — is a modern exception, or comes very close. MacGowan’s 1883 English-Hokkien dictionary was more than half way there as well.

    — At a nuts-bolts level, to straighten things out some: An Original Character doesn’t have to be ancient, “preexisting”, etc., although antiquity is a Neo-Chinese turn-on to a point. It just has to be compatible with The Lineage [of Standard Chinese], which includes Standard Chinese itself.

    — As an example off top, take the word KÀU, customarily 到 in sinographs, meaning ENOUGH or TO ARRIVE. The Original Character guys unanimously prefer 夠 or 够 for KÀU when it means ENOUGH, apparently based on 夠 (够) being used for a Mandarin word that appears (and may well be) cognate to KÀU. The graph didn’t exist before Song times; in Song times it was apparently a dialect graph, not used in the koine. But it’s used in Mandarin, so, for them, it overrides whatever was used in Hoklo.

    (Interestingly, most Orig. Char. guys seem to prefer using a different graph (e.g. 遘) for KÀU when it means TO ARRIVE, even though there is no indication that the two KÀUs are etymologically distinct. There are also usages where we don’t know “which KÀU” is being used. Orig. Char. Theory purportedly champions “etymology” and even “science”, but examples like KÀU prove the non-secular, belief-based nature of the endeavor….)

    — So the Original Character concept is as modern as the airplane, and implicitly centered on the modern national language of China. It’s situated in an intensely hierarchical complex of beliefs.

    — The supposed shared original Chineseness of all Chinese topolects mirrors the supposed shared original Chineseness of all Chinese people. So, to suggest the non-Chineseness of a Taioanese word is to suggest that Taioanese speakers are part bastard. This is why the Orig. Char. guys believe there’s no such thing as an Orig. Char.-less word element, only word elements whose Orig. Char. has yet to be found.

    (The more sophisticated may allow that certain spoken words were unwritten in Old China; in that case there might be no Orig. Char., but there’d still be an Orig. Etymon, used by Godking Yeller or his people.)

    — Loanwords are allowed, esp. if loaned from French, Other European (incl. English), or Japanese. If the word is from Cantonese, a respectful passive-aggressive compliment of some kind might be in order. If the word is from Malay or Tagalog, the brahmins in the room will pretend they didn’t hear / didn’t get the memo. People like to share that SAP-BÛN for SOAP — a loan from either Malay or Tagalog — was borrowed from French. In any case, loans are not existentially threatening.

    — BTW, 本字 more traditionally means THE ROOT GRAPH OF A DERIVED GRAPH. For ex., 要 is 本字 of 腰. This usage makes logical sense. The Orig. Char. usage is a misnomer even on its own terms.

    — What’s also interesting is that in describing Orig. Character Theory as “the ardent belief that for every Sinitic morpheme there is a corresponding … Sinograph”, you seem to be using “Sinitic morpheme” as a shorthand for “morphemes in a Sinitic language”…. Whereas Hoklo is FULL of etyma that might be called, in common scholarly usage, “non-Sinitic”, and this is part of why so many Orig. Chars. remain missing for Hoklo.

    — Again, we come to the question of what “Sinitic” is. What makes a language “Sinitic”? What would happen if we got that out of the way?

    — As for writing, as others have pointed out, there’s no such thing as an unwritable morpheme, even with sinographs. You borrow a graph — maybe sound-wise — and take it from there.

    — The Original Character fetish is a facet of Neo-Chinese nationalism. It is not inherent to the use of sinographs.

    — Remember, modern Japanese kana (besides maybe a couple) started out as sinographs borrowed sound-wise. Kana are visually specialised sinographs. 奈良 (Nara) is just なら without the visual specialisation.

    — Alongside a wide array of meaning-borrowed graphs, “Hoklo Nôm”, and “orthodox” usages, sinographic Taioanese makes heavy use of sound-borrowed graphs. And choice of graph has traditionally been much more uniform than seen with sound-borrowed graphs in Mandarin or even (by a small margin) Hokkien.

    — What this means is that Taioanese has kana. The sound-borrowed sinographs in Taioanese are just “Taioan kana” without the visual specialisation. These “Taioanese kana” beg to be firmed up and visually specialised. (Maybe this is the spirit of J.W. Brewer’s suggestion.)

    (The old-time government katakana for Taioanese are unwieldy, and nobody ever wrote with them. They have no advantage over the established romagraphs, not even aesthetically. The tone marks were directly borrowed from romanised Taioanese…. The gov’t katakana made sense for their one purpose, which was to help Japanese civil servants learn Taioanese.)

    — So an evolved, specialised form of the sound-borrowed graphs in sinographic Taioanese would improve the system and make it as tight as, and hopefully tighter than, the Japanese writing system.

    — Imagine sinographic Taioanese, fortified with its own in-house kana, coexisting with romagraphic Taioanese. To each their own.

    — “Not Sinitic enough” (@A'ióng) is not an issue. What makes (even) Japanese kana widely acceptable for writing Taioanese (in a way that Jonathan Smith might call “informally-conventionalized”) is not that they’re specialised sinographs; most Taioanese speakers aren’t really aware of this. Key difference, unfortunately, is that Japan has ruled Formosa, but Korea hasn’t.

    — Let me close by asking this again: What is “Sinitic”? What do we mean by this word that we keep using?

  16. Peter Cyrus said,

    March 14, 2025 @ 4:35 am

    @David Marjanović: Musa Fangzi gait is very similar to Hangeul in terms of use of space, number of elements, etc. I don't find Hangeul difficult to write. But Hangeul doesn't indicate tone.

  17. Jonathan Smith said,

    March 14, 2025 @ 6:38 pm

    The term Sinitic is unproblematic in reference to a putative family of related languages, and note it's totally normal that membership/structure of such should be controversial or unclear in some respects. But yeah, Sinitic affiliation needn't have any bearing on the question of writing: (hopefully?) no one is concerned whether some word BLAH of some Zhuang language is/isn't a deep borrowing from (or loan into) some Sinitic language such that the two words should/shouldn't be written identically in the respective Sawndip and Hanzi scripts — and so should it be for (e.g.) Taiwanese. The fact that such ortho-orthographical concerns are so prominent in the case of (e.g.) Taiwanese is — as Kirinputra suggests — a function of of the "Hua" ethnic fantasy.

    So in a complete vacuum — whatever script is of course fine for Taiwanese.

    In the "Chineseness" dream world — ortho-Hanzi are the only way to go if Taiwanese really really must be written to dick around now and again… but why not simplify matters and just use Mandarin across the emerging ethno-state huh?

    In the alternative-timeline independent Formosa — presumably the folk Koa-á-chheh type character script. At least in the sub-timeline in which the missionaries are repelled.

    In the current world — Romanized is probably best simply because, as many a heritage speaker has noted, it compels one to clearer recognition of facts of the language many of which one kinda knew but didn't actually know so great turns out while also (crucially!) blocking cribbing from Mandarin via character forms. So I say that the really urgent problem is the one affecting the *language* i.e. *talking* community.

  18. Chas Belov said,

    March 15, 2025 @ 2:36 am

    @Victor:

    Don't use "èryǐzi" among polite company.

    Fortunately or unfortunately, I don't speak Mandarin so I wouldn't even use it among impolite company.

  19. Chris Button said,

    March 15, 2025 @ 7:48 am

    @ Kirinputra

    I read this on wikipedia:

    "In Northern Hokkien dialects where the final -o̤ /ə/ is present, it is generally realized as [ɤ̟], and -o /o/ is realized as [o̜]. In dialects where -o̤ /ə/ is absent, [ɤ̹] is a possible realization of -o /o/."

    I take it to mean that advanced [ɤ̟] as /ə/ versus more-rounded [ɤ̹] or less-rounded [o̜] as /o/ are not consistently distinguished.

    Could you shed any more light on the history of this ə ~ o alternation?

  20. KIRINPUTRA said,

    March 15, 2025 @ 10:51 am

    @ Chris Button

    Sure. I hope I CAN. I’ll have to be vague in places where my knowledge doesn’t allow me to be more specific.

    (This gets at the tendency of human languages to put resources to use.) So in Hoklo dialects that lack (P.O.J. romanisation) -O̤, that mid-central vowel space might be claimed by other rime(s). (We could maybe get away with saying “phoneme”, but I think “rime” is more truthful or enlightening in the Mainland S.E. Asian context.)

    So Northern Hokkien — not used to this wording, but it makes sense — -O̤ corresponds to Teochew -O and -OE (actually -UE in the Teochew “P.U.J.”); meanwhile, Teochew -Ṳ has mid-central variants. And -O̤ corresponds to Southern Taioanese -OE and -E, while modern Southern Taioanese -O is phonetically a mid-central vowel. Also, -O̤ corresponds to Amoy Hokkien -E, and apparently Amoy -O has tended towards a mid-central realisation for some time.

    (Some scholars really like to point out that -O̤ as heard in a couple of “up-island” Taioanese dialects is phonetically higher (?) or more to the front (?) than Southern -O.)

    The mid-central space is unclaimed by any single-vowel rime in “canonical”, non-Southern mainstream Taioanese (although a mid-central vowel does widely occur in -ENG & -EK, for instance, and -NG for some speakers), but Southern Taioanese — meaning “southwestern”, (very) roughly from the city of Kagī 嘉義 on down — has been the prestige dialect of Taioanese since c. the late 1980s, so you’d expect the mid-central realisation of -O to spread up-island. I think it has, but Fon & Khoo (recent) had really interesting findings on this.

    BTW this topic comes up a lot when literate (in Taioanese) Taioanese speakers compare dialects. “Traditionally”, up-island speakers would model the (their) -O / -O͘ distinction, and Southern speakers might (good-naturedly) say they couldn’t hear the difference. Now that an -O / -O͘ merger is well under way in some Northern cities, mutual respect is eroding, and a Southern mid-central -O is sometimes prescribed for younger Northerners looking to improve.

    BTW, I’m not sure how many vowels Proto-Hoklo had, phonemically, but, “pan-dialectally” (and pan-linguistically), modern Hoklo observes a nine-way contrast: i, ṳ, u, e, o̤, o, æ (ɛ), a, o͘ — although -O͘ is just -OU diachronically. (In other words, a pan-Hoklo romanisation must account for these nine single-vowel rimes, except for -O͘, which can be otherwise analysed.) This nine-way contrast, or some variation on it, seems to prevail throughout most of coastal Mainland S.E. Asia. The general lack of (high to mid) central vowels in Southern Hokkien (which rubbed off on most dialects of Taioanese at one point) seems to have been a contact phenomenon involving neighboring languages such as Hakka, poss. in the context of shift. This would be interesting to know more about.

  21. KIRINPUTRA said,

    March 15, 2025 @ 11:22 am

    @ Jonathan Smith

    Romanised is great, and arguably best, in the current world. But you also need genuine Taioanese & Hakka sinography, to push (in the public sphere) “Chineseness dream world” back offshore. Or else there will soon be, as you may soon repeatedly agree, no spoken language left for romanised to write.

    So I say that the really urgent problem is the one affecting the *language* i.e. *talking* community.

    In one’s “Da’an”-of-Taipei bubble, yes. (You wouldn’t be the first or last to make such a mistake — just look on Reddit.)

    Even there, though, you don’t give up on having an army just b/c having a navy is more urgent. You work on both.

    folk Koa-á-chheh type character script

    What’s this? Do KOA-Á CHHEH have an exclusive script in your world?

  22. Chris Button said,

    March 15, 2025 @ 2:19 pm

    @Kirinputra

    . “Traditionally”, up-island speakers would model the (their) -O / -O͘ distinction, and Southern speakers might (good-naturedly) say they couldn’t hear the difference.

    That's interesting. Although do you mean o / o̤ rather than o / o͘ ?

  23. Jonathan Smith said,

    March 15, 2025 @ 2:36 pm

    Well how would you characterize the reproductive-age Taiwanese-speaking community of oh say the heartland town of Táⁿ-káu a.k.a. Kaohsiung? "Not particularly robust" at the most generous, no? The language is not going to be transmitted at any viable scale to kids emerging from the womb now-ish, correct? If your assessment is more optimistic than that, *you* Kirinputra are in the bubble.

    But you know this… the issue seems to be that you're locked into an unfortunate Comrades-in-arms mindset in which any observation originating from outside the embattled community, even if superficially identical to something We say to Ourselves, is seen as in bad faith — the classic wanna-be-Progressive own-goal (we've scored them in spades in the U.S. of late.)

    Speaking of which — "folk Koa-á-chheh type character script" — why pretend this is unclear? Perhaps you can supply your preferred term for pre-ROC Hàn-jī-based Tâi-bûn, then I can use it, then you can object to that? And also — "there won't be a spoken language left to write": um, *my* point. ERGO, to the extent there are resources, they should IMO be assigned primarily to efforts at maintaining a community of SPEAKERS and only secondarily at stuff related to LITERACY: as all language teachers know, this latter is relatively easy to do (thus meriting far less attention / time / resources), whereas actually being natively good enough at a language to transmit it to a next generation? OK now that is hard.

    But you know all this.

  24. KIRINPUTRA said,

    March 15, 2025 @ 9:19 pm

    @ Chris Button

    No — -O [o] vs -O͘ [ɔ].

    (AFAIK, there is no Hoklo variety, past or present, with phonemic -O̤ alongside mid-central realisations of -O; -O̤ loss has been ongoing for at least a century, but it merges into -E at many or most locales or lineages, and splits to -E & -OE at some, maybe at the idiolectal level.)

  25. KIRINPUTRA said,

    March 15, 2025 @ 11:58 pm

    @ Chris Button

    Last comment applies mostly to Taioanese…. -O̤ corresponds to -E in Amoy & in "Northeastern" Hokkien (and from there the Philippines) — prob. also reflecting early-modern or 19th-cen. -O̤ loss. And Teochew -O (a rounded back vowel; I know of no exceptions) corresponds to both -O̤ and -O in "Northwestern" Hokkien. So you were on to something.

  26. KIRINPUTRA said,

    March 16, 2025 @ 12:30 pm

    @ Jonathan Smith

    Tranquilo, profesor. (Sorry, misposted elsewhere.)

    The SPEAKERS > LITERACY arg., often heard, is a strawman in this context. No tribe can stably maintain a Sprache in the 21st cen. — let alone see to its own political hygiene — w/o letters, unless there is no competing lettered Sprache. In other words, low literacy means no speakers eventually.

    If the tank is leaking, just adding fresh coolant ain’t gon help much for long.

    https://languagelog.ldc.upenn.edu/nll/?p=68447#comment-1628293

    "Not particularly robust" at the most generous, no?

    “Not robust” — now you’re making sense. This characterises things in a way that’s realistic beyond Taipei Basin + 桃園.

    This reality is taken into account in everything I & most others have said on this matter. Check your assumptions.

    (As for “heartland”, realise: Urban Takao — due in part to the concentration of Nationalist Chinese on the north side of town — is statistically the least Taioanese-speaking non-Hakka town in the Greater South, although not by much.)

    Perhaps you can supply your preferred term for pre-ROC Hàn-jī-based Tâi-bûn, then I can use it

    You almost nailed it with “Hàn-jī-based Tâi-bûn”; or “Tâi-bûn Hàn-jī” or “Tâi-oân Hàn-jī”, to refer to the graphs specifically. “Pre-ROC” misleads, since the customary TÂI-BÛN HÀN-JĪ are still on their feet and may outlast the Republic o’ China, God willing.

  27. Chris Button said,

    March 16, 2025 @ 2:41 pm

    @ Kirinputra

    Last comment applies mostly to Taioanese…. -O̤ corresponds to -E in Amoy & in "Northeastern" Hokkien (and from there the Philippines) — prob. also reflecting early-modern or 19th-cen. -O̤ loss. And Teochew -O (a rounded back vowel; I know of no exceptions) corresponds to both -O̤ and -O in "Northwestern" Hokkien. So you were on to something.

    I would have to look into the issue a lot more to suggest any kind of analysis (e.g. /ə/ was the predecessor of /o/, or /ə/ is colored in various ways, etc.).

    By coincidence, I have recently been pondering two things:

    1. The merger of Old Japanese /ə/ with /o/, which seems to be uncontroversial in the field now albeit with little attempt to address the phonetic change,

    2. The more controversial idea by Kortlandt and later Matasović that proto-Indo-European /o/ comes from earlier /ə/ (i.e., the classic e~o ablaut actually goes back to a~ə rather than the ə~a that Edwin Pulleyblank and others have proposed).

    I wonder if the world is trying to tell me something?

  28. don said,

    March 17, 2025 @ 10:53 am

    characters are cool though

RSS feed for comments on this post · TrackBack URI

Leave a Comment