The politico-cultural implications of Taiwanese romanization
« previous post |
Which do you think is harder — reading and writing Taiwanese with characters (sinographs) or with romanization?
I maintain — and I have tried to show over the years — that it's much easier to read Taiwanese written with roman letters than with Chinese characters. The same is true of all vernacular Sinitic languages.
It is relatively easy for a speaker of Taiwanese to become literate in roman letters, not at all so in characters. See the posts under "Selected readings" below.
There have been many efforts to write Hokkien / Hoklo / Taiwanese dictionaries. Still today, I think that the monumental Chinese-English dictionary of the vernacular or spoken language of Amoy by Carstairs Douglas (1830-1877) was the best.
The problem is that all lexicographers of vernacular Sinitic topolectal dictionaries who attempt to include sinographs get tripped up by them. In truth, there are many morphemes in these languages that lack verifiable sinographic equivalents, yet traditional Chinese scholars insist that all morphemes in "Sinitic" "dialects" (which is a sick / sad joke, because dialects are supposed to be mutually intelligible, whereas — as we have recently seen in a life and death trial — not even Taishanese and Cantonese, much less Cantonese and Mandarin can be proven to be "dialects" of a single language, even though that is a dogma subscribed to by countless legions) ineluctably are connected to a specific sinograph.
This is a barrier that the traditionalists pretend to get around by a sneaky subterfuge they call běnzì lǐlùn 本字理論 ("native / original character theory") that I have struggled against for decades.
What is this běnzì lǐlùn 本字理論 ("native / original character theory")? It is the ardent belief that for every Sinitic morpheme there is a corresponding, and, in the minds of many proponents of this theory, a preexisting, Sinograph.
There are countless morphemes in the host of Sinitic languages and topolects for which there are no known characters. I have written about this phenomenon scores of times on Language Log (see the "Selected readings" below for some examples). For the last four thousand years and more, innumerable morphemes have arisen and entered the Sinitic lexicon. Often we have no idea where these new morphemes came from, and frequently they come from non-Sinitic languages. Such being the case, how could there possibly be a preexisting Chinese character for them? There simply is no "běnzì 本字 ("native / original character") for each and every morpheme in Sinitic. Quite the contrary, morphemes come first, and characters are devised to write them. In other words, in terms of the evolution / sequence of morphemes vs. graphemes, the former are preexisting and the latter are secondary.
When people notice that there is an unwritten / unwritable morpheme floating around in the verbal lexicon and they decide it's something they want to write down, they cannot just transcribe the sounds of the new morpheme (or word) as is done with languages that use a phonetic script. Rather, they either have to invent a completely new character or borrow another character that has the same or similar sound as the target, characterless / benziless morpheme.
In a way, the běnzì lǐlùn 本字理論 ("native / original character theory") under discussion here is the reverse of the educated guessing game where you have a character you don't know how to pronounce and often are not sure what it means, so you make a more or less "educated" guess how to pronounce this unknown character and what it means.
In both cases, it is wishful thinking. Such procedures are not at all scientific and should be laughed out of the courts (!) of phonology and orthography.
Nonetheless, all of this talk about běnzì lǐlùn 本字理論 ("native / original character theory") and guessing how to pronounce unknown characters takes me back to some pleasant, prolonged bǐzhàn 筆戰 ("pen / brush battles / wars") that I had during the 70s and 80s with an old Taiwanese scholar named Wu Shou-li, who was the most eminent authority on Fukienese of that era. Our polite polemics really were bǐzhàn 筆戰 ("pen / brush battles / wars"), because that was in the days before computers, and we had to write out our respective sides of the debate and send them through the mail.
I was delighted to find this nice article about Professor Wu online:
"The Tongue-Tied Fate of Wu Shou-li", by Chen Kwe-fang, translated by Phil Newell, with photos by Wang Wei-chang Taiwan Panorama (December, 1989). [may no longer be available]
Professor Wu would say to me, "Professor Mair, I'm sure I can find the běnzì 本字 ('native / original characters') for every word in Fukienese, though I must admit that I haven't found them yet. So I have to keep looking." To which I would reply, "I respect your tenacity, Professor Wu, but I believe you could search for the rest of your life and you'll never find the běnzì 本字 for thousands of morphemes in Fukienese". For example, even such very common ones as chhit-tho ("play"), which borrows 七桃 ("seven peaches") and other outlandish characters to write it. See also the great dictionary of spoken Amoy by Carstairs Douglas, which has many entries lacking solidly established Sinographic forms.
And we would let it rest at that until the next round.
[This section has been adapted from parts of the first post in the list of "Selected readings" below.]
When Carstairs Douglas published his monumental dictionary of Amoy vernacular in 1873, there was not a single character in it. In 1923, Thomas Barclay (1849-1935) published from the Commercial Press in Shanghai a Supplement to Douglas's dictionary. Although Barclay added characters for many of the entries, he still left numerous entries without any characters assigned to them. For the completely new entries added by Barclay, most lacked characters. This is in sharp contrast to a bizarre dictionary compiled by the Department of Sinitic Topolects in the Institute for Chinese Languages and Script of Amoy University and published to great fanfare in 1982. All of the entries have characters and MSM pronunciations, by which they are ordered under head characters. All definitions are given in MSM and, indeed, the MSM elements of the dictionary are openly based on the well-known Xiandai Hanyu cidian (Dictionary of Modern Sinitic [i.e., Mandarin]). A sizable portion of the entries in this dictionary from Amoy (MSM Xiamen) University are not really authentic Southern Min terms at all, but are simply Mandarin words with Southern Min pronunciations added to them.
[This section has been adapted from a paragraph of the last post in the list of "Selected readings" below.]
Understandably, exponents of Taiwanese literacy have been eager to create a user-friendly, comprehensive dictionary for their language, but their proposals inevitably fall afoul of each other, as evidenced in this piece by Chén Cún 陳存:
Guāndiǎn tóushū: Qiǎnzé “Táiwān tái yǔ chángyòng cí cídiǎn' chōngmǎn miùwù
"Opinion Letter: Condemning the "Taiwanese Common Words Dictionary" for being full of fallacies"
Storm Media (1/3/25)
A bugbear or stumbling block or whatever you want to call it that stands in the way of those who want to create a good dictionary of Taiwanese (or other non-standard, topolectal variety of Sinitic is what to do with the characters. Tracing backward in time to review the best dictionaries of Taiwanese, I find that, so far as I can tell, they are all arranged alphabetically. Here are two from the 1970s:
Hong Kong: Hong Kong Language Institute, 1973. Pp. xlvi +
305.
Taiwan: Maryknoll Language Service Center, 1976. Pp. iv +
946.
In any event, Bernard Embree's dictionary had many virtues, including the fact that he checked all his entries against those of the stellar works of Douglas and Barclay mentioned above, as well as that of Ernest Tipson, Chinese-English pocket dictionary,of the Amoy vernacular. Taichung: Maryknoll House, 1935.
Now it is past time for the next dictionary of Taiwanese, and, as we have seen above, there are squabbles. I asked a number of colleagues I know who are in the midst of compiling their candidate for this generation's new dictionary of Taiwanese.their opinion about the dispute. One for whose work I have high regard is A'ióng, who replied to me thus:
As I believe I've mentioned before, my brother's Taiwanese mother-in-law had been completely illiterate for the first half of her life, but after that had the good fortune to take classes in POJ.
Pe̍h-ōe-jī (/peɪweɪˈdʒiː/ pay-way-JEE; Taiwanese Hokkien: 白話字, pronounced [pe˩ˀ o̯e̞˩ d͡ʑi˧] ⓘ, lit. 'vernacular writing'; POJ), also known as Church Romanization, is an orthography used to write variants of Hokkien Southern Min, particularly Taiwanese and Amoy Hokkien, and it is widely employed as one of the writing systems for Southern Min. During its peak, it had hundreds of thousands of readers.
For the second half of her life, she was literate.
Mutatis mutandis, the same is true for all other topolects and languages in China. Having their own written language will enable the speakers of a non-Mandarin or non-Sinitic language to write their own literature and preserve / disseminate their own culture.
My own mother was a Christian, and she often spoke about "false belief". Respecting her, it is my opinion that the dogma that all morphemes of Sinitic topolects and languages come with a predestined běnzì 本字 ("original character") is a "false belief".
That's as much as I want to say today. I think that Kirinputra will take up the thread of the search back in time beyond the advent of the ROC on Taiwan. I should mention that the promotion or promulgation of Taiwanese was illegal under the KMT in the early decades of the ROC. I knew people who were imprisoned for doing so.
Afterword: A tale of two tails
Yesterday as I was walking on the street with one of my students, she used the expression "èryǐzi". I told her I didn't understand that term. She said it means "a person with two tails". Hmmm. I said, "Oh, you mean 'yīgè rén yǒu liǎng ge wěibā 一个人有两个尾巴' ('a person with two tails')?" "Yes", she replied, "but we don't pronounce '尾巴' as 'wěibā', we pronounce it as 'yǐba'." (Well, I [VHM] have been pronouncing "尾巴" as "wěibā" for more than half a century, but I didn't make a point of it with her.) She continued, "That's why we pronounce this term as 'èryǐzi'." After a while, together we figured out that she was trying to say a Chinese equivalent of "hermaphrodite", but it's derogatory (also means "sissy") and regional (north central and slightly to the west). I'm simplifying things quite a bit, because it's late and I have to catch a flight tomorrow morning. Of course, Chinese also has more neutral, anatomical ways to say "hermaphrodite; intersex person". The reason I bring up this "Tale of two tails" here is because it illustrates how the same character can confuse people more than help them when they're talking about something and bring up the characters. And all of this was done using different topolects of "Mandarin".
BTW, this odd regionalism, "a person with two tails" is also a part of the vocabulary of Dungan speakers, who use cyrillic to write their language, not sinographs, hence эрйизы. For Dungan, see this post, "'Thanks' in Hakka and other Sinitic topolects" (2/15/25), and many other Language Log posts, some of them listed in the "Selected readings" below.
This sort of thing happens countless times when you read, write, and speak Chinese languages and topolects. I sort of get used to it.
Selected readings
- "Morphemes without Sinographs" (11/18/21)
- "Sinitic topolects in a Canadian courtroom" (3/9/25)
- "Writing Taiwanese with Romanization" (10/7/20) — features Aiong; with long bibliography
- "Taiwan(ese) Taiwanese" (7/22/24)
- "Hokkien in Singapore" (9/16/16)
- "Hoklo" (9/18/16)
- "The importance of being and speaking Taiwanese" (7/21/20)
- "Mixed script writing in Taiwan" (5/24/24)
- "Mixed script writing in Taiwan, part 2" (5/29/24)
- "Hokkien at UCLA" (4/20/22)
- "Hokkien at UCLA, part 2" (4/22/22)
- "A crack in the hegemonic edifice of hanzi" (5/23/24)
- Taiwanese, Mandarin, and Taiwan's language situation
- "Writing Sinitic languages with phonetic scripts" (5/20/16)
- "Confessions of an Ex-Hokkien Creationist" (9/20/16)
- "The Opacity and Difficulty of the Chinese Script" (9/18/08)
- Christine Louise Lin, "The Presbyterian Church in Taiwan and the Advocacy of Local Autonomy", Sino-Platonic Papers, 92 (January, 1999), xiii + 136 (free pdf)
- Alvin Lin, "Writing Taiwanese: The Development of Modern Written Taiwanese", Sino-Platonic Papers, 89 (January, 1999), 4 + 41 + 4 (free pdf)
- Victor H. Mair, "How to Forget Your Mother Tongue and Remember Your National Language", Pīnyīn.info (2003)
Chester Draws said,
March 12, 2025 @ 11:02 pm
Over the centuries there have been many languages that have moved from characters towards alphabets. Has there been a single case of an alphabetic language voluntarily moving towards characters? The modern emoji is a bit of an example, but that's all I could think of.
The case for alphabets over characters is over-whelming.
Chas Belov said,
March 13, 2025 @ 1:14 am
Hope you made your flight. When you get a chance:
Regarding two tails, ¿how does wěibā 尾巴 for hermaphrodite differ from cíxióngtóngtǐ 雌雄同體 for hermaphrodite, or are they interchangeable?
Peter Cyrus said,
March 13, 2025 @ 6:25 am
May I make a suggestion? Instead of promoting romanization, please consider advocating the use of a better alphabet. The Roman alphabet doesn't have letters for many of the sounds needed, it doesn't show the division into syllables as well as characters, and it seems like an admission that Western civilization is superior.
You could take a look at musa.bet/zh for an example of what's feasible.
John Rohsenow said,
March 13, 2025 @ 6:31 am
Although I use the word "wěibā" for 'tail',I have always sung the children's song, that I suppose most foreign language learners of Mandarin learn, as:
Liáng zhì lǎohǔ, liáng zhì lǎohǔ; pǎo dé kuài, pǎo dé kuài;
yīzhī méiyǒu ěrduǒ, yīzhī méiyǒu *yǐba*; zhēn qíguài, zhēn qíguài.
[Two tigers, two tigers, running very fast, running very fast;
one has no ears, one has no tail; very strange, very strange!]
w/out thinking very much what the characters for 'yǐba' were/are.
Victor Mair said,
March 13, 2025 @ 6:59 am
@Peter Cyrus:
Most of the alphabets of the world don't have letters that match the sounds of the languages for which they are used one-for-one.
In terms of grammar, syntax, lexicon, etc., it's more important to join syllables into words and other meaningful units than to split them up.
The alphabet is not "an admission that Western civilization is superior." For starters, think of all the "bahasa" of Southeast Asia. "Bahasa and the concept of 'National Language'" (3/14/13)
Victor Mair said,
March 13, 2025 @ 7:02 am
@John Rohsenow:
Very nice! I learned that children's song too, and I loved it.
Victor Mair said,
March 13, 2025 @ 7:03 am
@Chas Below:
Don't use "èryǐzi" among polite company.
Heading for the airport now.
Jonathan Smith said,
March 13, 2025 @ 7:35 am
"Write Taiwanese (or whatever language) in characters" is not at all the same thing as "try to find and employ so-called běnzì for every morpheme." It wouldn't matter how or how much the characters used to write say Taiwanese overlapped with those used to write say Mandarin or whatever else. A Taiwanese orthography is/would be a system unto itself where the only relationships that mattered would be those with words of Taiwanese.
There is no such thing as "verifiable sinographic equivalents" for the words of any language anywhere; this phrase doesn't make sense.
"Best" system to use would be "whatever they've been using" i.e. maintain convention. So e.g. to the extent Lô-má-jī was or remains conventional it would have been or would be nice to retain it. Same for the Koa-á-chheh character orthography. But if the conventions are dead or nearly so then you probably don't have a practical argument in these terms, just an emotional one — which would not be meaningless.
But again the question of orthography is neither here nor there absent a viable community of language users to sustain it. This is 99% of the problem wrt Taiwanese and all others except Cantonese (for now at least), where (at least when I was last looking over their shoulders) people write in a character-based informally-conventionalized system because they need/want to and generally without engaging in endless ideological debates about it; crazy how that works.
J.W. Brewer said,
March 13, 2025 @ 8:32 am
The Roman alphabet isn't even a good fit for English (or for several modern Romance languages). Reject the false dichotomy that it's kanji or romaji with no other options! Revive the extended version of katakana that was devised for writing Taiwanese phonemically almost a century ago! Or modify bopomofo to the same end. Stop catering to tourists. (The Presbyterians can of course continue to do as they like.)
If you really want a script of non-Asian origin, maybe Cyrillic-as-modified-for-Dungan can be readily further-modified for other Sinitic languages?
Mark Young said,
March 13, 2025 @ 8:57 am
@Victor: Sorry to disappoint you.
Bernard and Ainslie Embree were not brothers or first cousins on their fathers' sides. Ainslie's parents and grandparents were all born in Nova Scotia, while Bernard's father and paternal grandparents were born in the USA. Following Bernard's male line back leads from Washington to Missouri to Kentucky, none of which were significant sources of migration to or from Nova Scotia. Thus even second or third cousin is unlikely.
There is a (very) faint chance that Bernard and Ainslie were related on their mothers' sides, as Bernard's mother and her parents were from Nova Scotia. In that case the Embree name would just be a coincidence.