Translating from Classical Chinese / Literary Sinitic to Mandarin

« previous post | next post »

For those who are unfamiliar with Classical Chinese (CC) / Literary Sinitic (LS), what I am about to demonstrate in this post may be completely revelatory.  Many outsiders to CC / LS operate under the misapprehension that — because they are both written with hanzi  漢字 / 汉字 ("Chinese characters / sinoglyphs") — anyone who can read Modern Standard Mandarin (MSM) ought to be able to read CC / LS texts without too much difficulty.

Nothing could be further from the truth.

How did this subject come up?

On the last day of 2023, I made this post:  "The Miracle of Western Writing" (12/31/23).  In it, I referred to Xī rú ěrmù zī 西儒耳目資, a book written by the Jesuit missionary, Nicolas Trigault (1577-1628), and translated the title as Aid to the Eyes and Ears of Western Literati.  The first commenter, Philip Taylor, asked, "…is it really possible that Xī rú ěrmù zī (西儒耳目資) can mean 'Aid to the Eyes and Ears of Western Literati'? So much meaning packed into just five Hanzi/?"  To which I replied, "The 5 hanzi mean what the 5 capitalized English words indicate: Western Literati Ears Eyes Aid. That's basically how we read Classical Chinese / Literary Sinitic that is easy and straightforward."  Philip then noted that he had asked GT and ChatGPT, and both of them were stumped.  That's understandable, for CC / LS is a dead, classical language, completely different from the living, vernacular MSM that GT and ChatGPT are designed to render (cf. Sanskrit / Hindi and Latin / Italian [much less English]).

katarina then weighed in and properly said that GT and ChatGPT "are flummoxed because the five characters are in Classical Chinese, not in present-day Chinese, i.e., they are in Classical Chinese grammar. One of the beauties of Classical Chinese is its extreme brevity."  She further thanked me for the information about Matteo Ricci and Nicolas Trigault.  I was especially grateful that katarina said, "For a non-linguist, I am very glad this is Language Log and not Linguistics Log, because we can learn, through the LL postings, so much about language as well as about culture and civilization."

Whereupon Philip Anderson stated, "With Professor Mair’s explanation, I can see how those five characters express the meaning given. How would the same sense be expressed in modern Mandarin? Less tersely I assume."

At that point, I immediately organized an "experiment" among my MA and PhD students from China in my classes, asking them to translate "西儒耳目資" into pure, vernacular MSM.  The results are as follow.

1.

gěi xīfāng xuézhě de tīng hànyǔ, kàn hànzì de fǔzhù cáiliào

给西方学者的听汉语、看汉字的辅助材料

"auxiliary materials for Western scholars to understand spoken Sinitic and read sinoglyphs"

2.

Preliminary grammatical analysis:

(wèi) xī rú (zhī) ěrmù (ér) zī

(为)西儒(之)耳目(而)资 

"(for) Western literati('s) ears eyes (and) aid"

or 

xī rú (zhī) ěrmù (xiāngyìng zhī) zī

西儒(之)耳目(相应之)资

"Western literati('s) ears eyes (corresponding) aid"

Translations:

a) 

gěi xīfāng xué rén/rú shì ěrduǒ (tīng yīn) hé yǎnjīng (shìzì) de bāngzhù

给西方学人/儒士耳朵(听音)和眼睛(识字)的帮助

"help for Western scholars / literati (for) their ears (to hear sounds) and their eyes (to recognize glyphs)" 

b) 

gěi xīfāng dú (zhōngwén) shū (de) rén suǒ tígōng de (jiéhé) shēngyīn yǔ zìxíng de bāngzhù

给西方读(中文)书(的)人所提供的(结合)声音与字形的帮助

"help for Westerners who are reading (Chinese) books that provides (combined) sounds and shapes of glyphs"

c)

bāngzhù xīfāng dúshūrén biàn yīn hé rèn zì de zīliào

帮助西方读书人辨音和认字的资料

"materials to help Western scholars distinguish sounds and recognize glyphs"

3.

bāngzhù xīfāng xuézhě (xuéxí zhōngwén) tīng shuō dú xiě (de shū)

幫助西方學者(學習中文)聽說讀寫(的書)

"(a book) to help Western scholars (learning Chinese) to hear and speak and to read and write"

4.

bāngzhù xīfāng xuézhě de zīliào

帮助西方学者的资料

"material for helping Western scholars" (she completely left out "ears" and "eyes")

5.

bāngzhù xīfāng chuánjiào shì yòng ěrduǒ tīng (hànzì yīnyùn) yòng yǎnjīng pīndú (hànzì dúyīn)

帮助西方传教士用耳朵听(汉字音韵)用眼睛拼读(汉字读音)

"Help for Western missionaries to use their ears to hear (the phonology of sinoglyphs) and use their eyes to spell out (the sounds of sinoglyphs)"

6.

xīfāng xuézhě (yòng lái) zēngzhǎng jiànwén de zīliào

西方学者(用来)增长见闻的资料

"materials (for use) by Western scholars to increase their seeing and hearing" — ěrmù 耳目 ("ear eye") here might be a metonym for shìtīng 视听 ("see hear") or jiànwén 见闻 ("look listen")

7.

gěi xīfāng xuézhě de zhùyīn shū

給西方學者的注音書

"a phonetically annotated book for Western scholars" — One realizes how concise and economical classical Chinese is.

This experiment demonstrates an argument I've been trying to make for the last half-century, namely, it is just as challenging to translate Classical Chinese / Literary Sinitic into Modern Standard Mandarin (MSM), or any Sinitic topolect for that matter, as it is to translate Classical Chinese / Literary Sinitic into Russian, German, Arabic, Hebrew, Bahasa Melayu, Hindi, Telugu, English, French…, or any modern vernacular language — if you're doing an honest job of it.  But most people just cop out and copy large chunks of the classical / literary language into the vernacular, and I have to remind them that they haven't really understood the text they are supposedly translating.

 

Selected readings

Over the years, there have been dozens of Language Log posts comparing classical / literary and vernacular.  One can say that, in a sense, they have come to a head here.

 

[Thanks to Zhaofei Chen, Shuheng Zhang, Xiaoqiao Zhu, Qianheng Jiang, Xiaohan Wang, Yifei Yang, and Zihan Guo]



60 Comments

  1. Arthur Waldron said,

    January 14, 2024 @ 11:09 am

    William Hung 洪業 required his Chinese-speaking students to read Four Books in English; English-speaking students to read them in Chinese. Translation forces issues otherwise easily left unsolved. In contemporary Chinese publishing Classical Chinese is often quoted without translation. Very bad for understanding meaning.

  2. J.M.G.N said,

    January 14, 2024 @ 2:36 pm

    In the light of your post, it's clear that "Classical Chinese" is not a fitting name.

  3. Victor Mair said,

    January 14, 2024 @ 8:27 pm

    @J.M.G.N :

    How so?

    What do you think a more fitting name would be?

  4. AntC said,

    January 15, 2024 @ 4:26 am

    Do these 5 Hanzi represent a language that would be spoken? Or is it more like a military/administrative shorthand? The pile-up of nouns suggest to me those indigestible clumps of English managementspeak — whose whole purpose is to obfuscate.

    The only one in Prof Mair’s gloss that’s verb-ish is ‘Aid’. How do we know the ‘Literati’ are being aided rather than aiding (someone)? How do we know who the ‘Eye(s)’ and ‘Ear(s)’ belong to? The Literati or those they’re aiding?

    Could that phrase in a sufficiently strong context mean: Aid for Scholars (teaching) Western Eyes and Ears ? (That is, for Chinese Scholars.)

    I see that the volunteered translations (or rather, Prof Mair’s translations of the volunteered MSM phrasings) have inserted prepositions and/or finite verbs to indicate subject vs object vs possession. They’ve taken (I guess) the most parsimonious interpretation of the 5 Hanzi; but are those the only possible interpretations?

    Contrast that Classical Latin phrases (Senatus Populusque Romanum, Pro Publica) although also written in a formulaic shorthand, at least include prepositions/enclitics/inflections so you can have a fair stab at how the nominals relate to each other.

  5. AntC said,

    January 15, 2024 @ 4:38 am

    @J.M.G.N. ”Classical Chinese" is not a fitting name.

    I also am not following. ‘Classical” as in “Classical Latin” of Rome, surely? Not the Latin as she was spoke by conscripts on the windswept Hadrian’s Wall or indeed the teeming outskirts of Rome. Not the Latin of the Mediaeval Churchmen or Scholastics; nor of the European Enlightenment’s lingua franca. Not the Latin -> Italian/Spanish/Portuguese gazillions of dialects after the collapse of the Empire.

    (Easy to find non-Classical Chinese parallels for all of those.)

  6. Michael Carasik said,

    January 15, 2024 @ 10:11 am

    There's a similar phenomenon of Israeli scholars of classical Jewish texts simply quoting them as if their meaning were straightforward. I gave kudos to Yairah Amit in my review of her commentary to Judges in the מקרא לישראל series because she translated the difficult passages into Modern Hebrew. (But the translation of the entire Bible into Modern Hebrew that was getting some ink a few years ago did not impress me.)

  7. Peter Grubtal said,

    January 15, 2024 @ 10:23 am

    Regular visitors will be well aware of Prof. Mair's vigorous aversion to referring to hanzi as ideograms. With modern Chinese perhaps his opinion may have some justification, but the whole of the above seems to indicate that in classical Chinese that is exactly what they are.

    AntC : yes, these labels are conventional, but useful. Yes, we know that what was spoken on the street will have differed somewhat from Cicero's orations, but the essential of classical Latin as we understand it was the inflectional system, and from what remains in Italian and Spanish it looks probable that for verbs at least it was still current, colloquially as well, until the early medieval period. When the substantives lost their inflections – apart from for gender and number – I think is not well known, but I would make that the break between classical Latin and romance languages.

  8. Richard Futrell said,

    January 15, 2024 @ 11:15 am

    ChatGPT isn’t at all flummoxed by the book title:

    User
    Translate this 16th century book title to English: 西儒耳目資

    ChatGPT
    The title "西儒耳目資" from the 16th century can be translated to English as "Western Scholar's Audio-Visual Aids." This title suggests that the book might be a collection of resources or materials intended for the education or enlightenment of scholars, with a focus on Western knowledge or perspectives, especially considering the historical context of the 16th century.

  9. Jim Unger said,

    January 15, 2024 @ 11:29 am

    For a detailed discussion of "classical" as applied to Latin, see Nicholas Ostler's Ad Infinitum: a Biography of Latin (2007). As for the term ideogram, both the late John DeFrancis and I have written extensively on the many reasons that a truly ideographic writing system is impossible. There are legitimate uses of the term logogram, but it is more accurate to describe the units of a writing system as functioning more or less logographically or phonographically depending on the context in which they are embedded. Describing entire writing systems as "logographic" or "phonographic" is highly misleading unless one is talking about cryptographic codes or IPA transcriptions, which are purposely contrived to be as nearly logographic or phonograhic, respectively, as possible.

  10. John Swindle said,

    January 15, 2024 @ 5:21 pm

    @AntC: 资 "aid, resource" is indeed verb-like, but 西儒耳目资 (literally "west[ern] scholar ear eye aid") does not and need not contain a verb. To me the syntax feels almost English-like. I don't think it would be open to interpretation as Western scholars aiding someone else. "West[ern}" modifies "scholar," so it can't be unspecified scholars teaching Westerners. I do think (others may correct me) that it could mean an eye and ear aid prepared by a Western scholar or scholars.

    It's interesting that in English it's almost always eye and ear, whereas in Chinese it's the other way around. Like North and South.

  11. liuyao said,

    January 15, 2024 @ 8:53 pm

    I'm happy to see bilingual editions of Chinese classics published in recent years.

    It is plausible that the earliest scribes of Chinese language(s) would have omitted particles and grammar-functioning inflexions (tones or finals), precisely because the characters are sufficient to get the meaning across. Perhaps when read out, filler words would be added back in.

    More extreme form of this brevity is in Chinese poetry, which sometimes does read like a list of nouns and verbs:

    枯藤老树昏鸦
    小桥流水人家
    古道西风瘦马
    夕阳西下
    断肠人在天涯

  12. AntC said,

    January 16, 2024 @ 3:26 am

    @John S does not and need not contain a verb.

    As a book title, indeed it does not/neither in English. So perhaps we should rather be considering a paragraph from the book?

    To me the syntax feels almost English-like.

    Hmm. Book titles _aren’t_ necessarily in English — that is, not qua sentences. I said those pile-ups put me in mind of managementspeak — which I don’t count as English.

    @liuyao poetry, which sometimes does read like a list of nouns and verbs:

    At risk of sounding culturally imperialist, English ‘poetry’ with only nouns and verbs I’d give very few marks for effort. It might bring together pretty ideas; it might sound pretty — lost in translation, of course; but I’d say the poet has failed to convey any stimulating thought. (After all, anybody can go for a walk in the countryside. But can they evoke the experience in language to someone who hasn’t visited that place?)

  13. John Swindle said,

    January 16, 2024 @ 5:50 am

    @AntC: Yes, book-title Chinese, book-title English. I should have been clearer.

  14. Victor Mair said,

    January 16, 2024 @ 8:08 am

    Of all of the hundreds of thousands of extant Chinese poems, the one quoted by liuyao is among my top two or three favorites. It is by Ma Zhiyuan (1250?-1323? [during the Mongol period]):

    "Tiān jìng shā · qiūsī” Mǎ Zhìyuǎn

    《天淨沙·秋思》馬致遠

    Kū téng lǎo shù hūn yā
    枯藤老樹昏鴉

    Xiǎo qiáo liúshuǐ rénjiā
    小橋流水人家

    Gǔdào xīfēng shòu mǎ
    古道西風瘦馬

    Xīyáng xī xià
    夕陽西下

    Duàncháng rén zài tiānyá
    斷腸人在天涯.

    And this is my English translation:

    Tune: "Heaven-Cleansed Sands"
    Autumn Thoughts

    Withered wisteria, old tree, darkling crows —
    Little bridge over flowing water by someone's house —
    Emaciated horse on an ancient road in the western wind —
    Evening sun setting in the west —
    Broken-hearted man on the horizon.

    Does the English translation fail to convey the timelessness of the Chinese original? When I read this poem — whether in Chinese or in English — I often think of the images as existing in an eternal present.

    See "Poetic Angst over Time and Tense" (4/24/11)

    https://languagelog.ldc.upenn.edu/nll/?p=3108

    BTW, in the whole of the 5 lines and 28 glyphs of the Ma Zhiyuan poem, there is only one glyph that functions as a verb, and it is one that we normally don't think of as having verbal properties.

  15. Jerry Packard said,

    January 16, 2024 @ 8:37 am

    It is worth pointing out that in addition to being a book title, another factor contributing to the terse brevity is the fact that at that time, the written language used mostly single-syllable words, as seen in the title itself. Also, it is likely that the title would not be considered an instance of the language spoken (vs. written) at the time, and by a non-native speaker to boot.

  16. Jonathan Smith said,

    January 16, 2024 @ 10:58 am

    Re: Wenyanwen (in contradistinction to objects of academic study like, say, the language(s) of the Western Zhou Bronze inscriptions), the only interesting/meaningful thing at any given moment is the status of this "language" in the minds of the people who read/write it. These people have not in general learned by learning syntax in some explicit manner, certainly not by learning (largely inaccessible) phonology, and not even by learning lexical items per se as one does when learning, say, French as an English speaker.

    Instead they have learned by (fan4du2 泛读-style) reading, processing words of texts by reference to modern language (now mostly = Mandarin) words — thereby building a pseudo-Mandarin or "Mandarin+" that consists of items at times simply identified with Mandarin homographs, at times representing extended meanings of Mandarin homographs, at times constituting really novel "Mandarin+". So learning takes the form of observations like, oh, I had identified written form "N" with the Mandarin homograph X, but exposure or instruction reveals this to be an extension or a different item; let me modify my understanding of X slightly, or establish an item X1 in my Mandarin+ lexicon.

    WYW has always worked this way; it is like a trailing rake/harrow that can potentially gather all past lexical (also syntactic) detritus, with contemporary understandings and thus any newly-produced texts combining this collected stuff with words/rules of modern languages in a non-discriminating manner.

  17. Jerry Packard said,

    January 16, 2024 @ 12:04 pm

    @Jonathan
    I cringe at your use of the word ‘detritus.’

  18. Jarek Weckwerth said,

    January 16, 2024 @ 1:05 pm

    I, for one, think, a bit of linguistics could absolutely help here.

    I don't think AntC's original question has been answered in a satisfactory manner. (1) Can the original five items be classified as parts of speech? (2) If so, what are they? Is it really a Germanic-like noun pile-up? (3) Why is it, specifically, that another interpretation is not possible? Word order? (Cf. west scholar ear eye aid 'manual for Westerners' vs. west ear eye aid scholar 'Western ophthalmologist' vs. west aid ear eye scholar 'member of a Western aid organization that helps us with ear and eye diseases' vs. aid-west ear eye scholar 'our scholar that helps the west in ear and eye diseases'.) (4) Out of combinatoric mathematical curiosity (and note I know nothing about Chinese, and this is a bona fide question), assuming Classical Chinese was mainly single-syllable words, how many words were there? From what I've heard the phonotactic possibilities were only slightly more adventurous than Modern Chinese?

  19. katarina said,

    January 16, 2024 @ 2:16 pm

    Re @ Victor Mair

    It wouldn't take an Agatha Christie to figure out the meaning of the poem even if Prof. Mair's lovely translation were more laconic:

    Withered wisteria old tree darkling crows
    Little bridge flowing water someone's house
    Emaciated horse ancient road western wind
    Evening sun setting west
    Broken-hearted man on horizon

    The Chinese does have the word "on" (在)。

    Both the Chinese and the English, here, each have 27 words, but the Chinese is more terse all the same, with 29 syllables/characters in Chinese to 44 syllables in English.

  20. Jerry Packard said,

    January 16, 2024 @ 2:18 pm

    “I, for one, think, a bit of linguistics could absolutely help here.”

    Happy to try.

    “I don't think AntC's original question has been answered in a satisfactory manner. (1) Can the original five items be classified as parts of speech?”

    Words in Classical Chinese are notoriously difficult to assign to a POS category. In a sense it is only a translation exercise, since the POS is something you are discussing relative to the translation exercise. If confined to Chinese (I.e., the Chinese of that time period), then POS is a less useful construct because it assumes a grammatical analysis that used certain posited POS categories for the language of that time.

    “(2) If so, what are they? Is it really a Germanic-like noun pile-up? (3) Why is it, specifically, that another interpretation is not possible? “

    It is indeed the ordering and positioning of words, the biggest factor being that in Chinese the modification order is generally modifier precedes modified, as you can see from the translations.

    “(4) Out of combinatoric mathematical curiosity (and note I know nothing about Chinese, and this is a bona fide question), assuming Classical Chinese was mainly single-syllable words, how many words were there? “

    A tough question for any language at any time. Basically this would be answered by performing an analysis of large samples of extant texts, mindful of the distinction between less and more colloquial texts.

    “From what I've heard the phonotactic possibilities were only slightly more adventurous than Modern Chinese?”

    That is true. But, one would need to keep in mind that modern Chinese made much greater use of ‘compounding’ than the language of Ricci’s time, offering a more productive word formation system.

  21. katarina said,

    January 16, 2024 @ 2:26 pm

    The Chinese is also more visually and spatially compact:

    枯藤老树昏鸦
    小桥流水人家
    古道西风瘦马
    夕阳西下
    断肠人在天涯

    Withered wisteria old tree darkling crows
    Little bridge flowing water someone's house
    Emaciated horse ancient road western wind
    Evening sun setting west
    Broken-hearted man on horizon

  22. Philip Taylor said,

    January 16, 2024 @ 5:23 pm

    Can we infer, with 100% certainty, that "人" (MSM: rén) in the last line translates to "man" ? Is not "人" gender-neutral, and therefore leaves the sex of the referent unspecified ?

  23. Jarek Weckwerth said,

    January 16, 2024 @ 5:51 pm

    @ Jerry Packard: Thank you, this is much appreciated! Where do I send the medal?

    But actually, three more. (I'm just a very simple phonetician.) (5) So if the order matters, would it be possible to achieve effects similar to my contrived English examples? In other words, how free is the word order? (6) You always need to work from a grammatical analysis; I don't quite see how that makes POS less useful… (7) As far as the number of words goes, I now realize I meant something like "theoretically possible number of phonotactically well-formed syllables". I can't remember the numbers posited for Mandarin off the top of my head, and I really need to go to sleep now, but I seem to remember it was shockingly low. (You know, from the point of view of a Slavic language like mine with all the insane clusters.)

  24. Victor Mair said,

    January 16, 2024 @ 7:33 pm

    If you take tones into account, there are about 1,300 possible syllables in MSM, only a little over 400 syllables when tone is ignored.

    There are 133 possible syllables in Japanese.

  25. Jerry Packard said,

    January 16, 2024 @ 8:10 pm

    “(5) So if the order matters, would it be possible to achieve effects similar to my contrived English examples? In other words, how free is the word order? “

    Order matters – the order is not free, though your contrived examples are less distinguished by order than by lexical semantics (e.g., ophthalmologist, disease are not possible interpretations).

    “6) You always need to work from a grammatical analysis; I don't quite see how that makes POS less useful… “

    I meant that POS information for classical Chinese is critically dependent, as you say, on your grammatical analysis. Using modern grammatical analysis does seem intuitively correct (W.A.C.H. Dobson, Ted Pulleyblank and Harold Shadick have done good work here), while positing POS based on analyses contemporary to the production of that work is more difficult for us because of our poor understanding of those contemporary analyses, such as they are.

    “(7) As far as the number of words goes, I now realize I meant something like "theoretically possible number of phonotactically well-formed syllables". I can't remember the numbers posited for Mandarin off the top of my head, and I really need to go to sleep now, but I seem to remember it was shockingly low. “

    Modern Mandarin has abt 1274 phonotactically permitted syllables (abt 400 if you don’t incl tone), which seems low when compared to the 6-7000 syllables estimated for English but seems large compared to the 400 syllables of Japanese. The Mandarin of Ricci’s time had three fewer initials (no palatal series) but did have finals with the -m coda, so I don’t know if the number of phonotactically permitted syllables would have greatly differed.

  26. Victor Mair said,

    January 16, 2024 @ 9:20 pm

    We are privileged to have Jonathan Smith's comments on wényánwén 文言文 ("literary language writing"), WYW, also called by many other names in Sinitic languages, English, Japanese, Korean, and other languages. Of course, there are many other articles and treatises on WYW, some of them quite good for explaining what kind of written language this is, when it arose, how it works, and so forth.

    What sets JS's treatment of WYW apart is that it shows how practitioners of this type of written Sinitic learned it and employ it. He uses the expression fàndú 泛讀 ("extensive reading"), which is not a kind of systematic analysis of syntax, phonology, or lexicon. Using a good, linguistically sophisticated textbook such as Shadick's A First Course in Literary Chinese (in three volumes), I adopt a modified version of this method when I teach WYW to adult non-native learners. I give them an analytical foundation of the basics of WYW to jump them up to the level of a native savant at about the age of 10 or so. Then, when they ask me what to do next, I tell them just "to read as much as you can", i.e., fàndú 泛讀 ("extensive reading"), i.e., I turn them loose on the canon, giving them tips about what is good to read and what should be avoided. If they do that seriously for about ten years, by the time they're around 30 or so, they'll be pretty darn good at approximating native scholarly levels of reading and understanding WYW, though I don't advocate that they spend as much time as native literati do on memorizing texts. That would be a waste of their time and brain power, plus now we have computers that can search vast databases almost effortlessly.

    JS's short, concluding, third paragraph is brilliant, so I will simply repeat it here as something for interested Language Log readers to gnaw on:

    =====

    WYW has always worked this way; it is like a trailing rake/harrow that can potentially gather all past lexical (also syntactic) detritus, with contemporary understandings and thus any newly-produced texts combining this collected stuff with words/rules of modern languages in a non-discriminating manner.

    =====

    I also wish to acknowledge the stimulating discussion carried out by the other participants of the comments to this post.

  27. AntC said,

    January 17, 2024 @ 7:28 am

    Indeed. Thank you Prof Mair for facilitating such illuminating digressions.

    Re ‘Autumn Thoughts’, I have a couple of stylistic observations (not sure if these apply for the translation or the original):

    “someone’s house” seems weak. Doesn’t (nearly) every house belong to someone/ is lived in by someone? Is “someone” there to pad out the scansion or line up a rhyme? It would fit the bleak mood better to say “no-one’s house”/“an empty/abandoned house”.

    “Broken-hearted man” [Philip T has already asked whether we need to be gender-specific]. Since they’re “on the horizon”, we might reinforce the ineffability by saying “figure on the horizon”? Again, what’s “man/person” contributing?

    More generally, these seem to be five separate images, connected only by all being vaguely sombre/Autumnal, and rhyming — well, differing tones of the same vowel. (Again at risk of cultural imperialism, five lines with no variation in the rhyme would be considered rather unenterprising in English.) Contrast a sonnet’s rhyme scheme and thematic unity or even a scheme

    the first, second and fifth line rhyme, while the third and fourth lines are shorter and share a different rhyme.

  28. AntC said,

    January 17, 2024 @ 8:19 am

    @Jerry/Jarek, I think Part-of-Speech is not exactly what I was reaching for, but it’s in the right ball park. Note English has many homonyms which are ambiguously verb or noun, and no inflections to give any hints. Indeed “aid” is one. (It ‘feels’ more verb-ish to me, but that’s only an intuition to make a start on that book title.)

    The ‘modifier precedes modified’ is useful intel; then how does “scholar” ‘modify’ “ear” and how does “scholar ear” modify “eye”? The short answer is “ear eye (and)” form a unit ‘audio-visual’ that “scholar” ‘modifies’ by having in their head.

    So we need a notion of role (agent/patient/possessor/possessed) and an expectation of the relative ordering of participants for each role in the pile-up.

    Withered wisteria, old tree, darkling crows

    Now does the wisteria count as a tree, so the poem’s saying it’s both old and withered? Or are there two distinct woody whatevers in the landscape? And how do we tell this case apart linguistically from the “ear eye” pairing?

    Are we in the sense-land of the late Turner? There’s tree-lined railway-bridged river-ness or oriel-window cathedral-ness; but deliberately nothing more concrete.

  29. katarina said,

    January 17, 2024 @ 9:49 am

    Belated and heartfelt apologies to Prof. Mair for tampering with his translation of "Autumn Thoughts" (《天淨沙·秋思》). I should have illustrated my point with another Chinese poem.

  30. Jerry Packard said,

    January 17, 2024 @ 10:15 am

    The ordering is simple once constituency is determined. The constituency is

    1. [西[儒]] = western scholar (western modifies scholar)
    2. [[耳][目]] = ear and eye (ear and eye are coordinate – neither modifies the other)
    3. [资] = aid

    Then, 1 modifies 2 which modifies 3. In terms of bracketing:

    [[西[儒]][耳目][资]] = ‘aid (for the) eyes +ears (of) western scholars’

    That is, it is a Noun (aid) composed of a head noun (aid; no pun intended) modified by a (complex) Noun (eye+ear), modified by a (complex) noun (scholar modified by western).

  31. Jarek Weckwerth said,

    January 17, 2024 @ 1:18 pm

    @ Jerry Packard: Thanks a million! You have helped a lot! This is getting really (unexpectedly) intriguing.

    However, if you need to determine constituency first, it just moves the question one level up. On what basis do you determine constituency if there are no relational markers of any sort? Am I right in thinking there's not even an and in 'ear and eye'?

    (And again, these are all bone fide questions. This is all well outside of my comfort zone — teaching even a little bit of syntax to first-years is often a challenge for me…)

    Is my hypothesis reasonable, then, that the determination of constituency starts from either (a) the semantics or (b) the co-occurrence of elements ("phraseology")? That is, 'ear and eye' are naturally coordinated semantically, and therefore neither connects to the adjacent elements (for lack of a better term) outside of the middle bit; or that they simply co-occur sufficiently often to count as a more likely bigram, with the same consequence? In other words, the sequence scholar ear is (a) semantically more "anomalous" (even though I could certainly concoct some metaphorical sense of, I dunno, 'professional headphones' or something; oh I see not many of those in the olden days) or (b) doesn't form a frequent bigram, probably because of (a).

    There are dozens and dozens of posts on here tagged as "garden path sentences" and the like, and most of those depend on the impossibility of determining POS for nouns and verbs in English from the form alone. In those cases, the disambiguation can usually be done on the basis of the rest of the string, or from the "real world" context. But sometimes you cannot disambiguate…

    (Also, the first element has been glossed as both 'western' and 'west' in this thread. Is it unmabiguously an adjective?)

  32. Jarek Weckwerth said,

    January 17, 2024 @ 1:34 pm

    Oh, I forgot: On the number-of-syllables front: 1300 syllables is plenty if you can make words of two syllables; theoretically 1300^2, i.e. 1,690,000. You can build a serious civilization with that many words no problem. Ditto for 300 if you can make long words the way Japanese does.

    But not if your words are supposed to be single syllable only. It's probably an unanswerable question to ask how many words you need to invent paper or gunpowder, but I would guess more than 1300. Maybe 10,000? (That's apparently in the ballpark of a modern English-speaking 10-year-old.) This means massive, massive homophony if you only have 1300 phonological forms. Modern Mandarin is famous for homophony, so I assume Classical Chinese must have been far more extreme?

  33. David Marjanović said,

    January 17, 2024 @ 2:13 pm

    Strunk 'n' White: "Omit needless words!"
    CC/LS: "Omit needless!"
    – source forgotten

    Modern Mandarin is famous for homophony, so I assume Classical Chinese must have been far more extreme?

    If you read it with the sound values of modern Mandarin, then yes, lots of things just collapse. For example, there are no less than 64 characters – potential Classical words – that are read today.

    But in the 16th century CC was pretty much only written.

    With Middle Chinese sound values (7th to 11th centuries or so), there's a lot less homophony and confusion.

  34. Jerry Packard said,

    January 17, 2024 @ 2:38 pm

    Where does one begin? I suppose from the comprehension of the native speaker. Such a speaker will view 西儒耳目资 and determine constituency by 1. bigram cooccurrence frequency (i.e., knowledge of the language by experience), 2. modifier precedes modified, and 3. plausibility.

    The garden-pathing you mention is a function of plausibility. The native speaker/hearer will be led down the garden path and usually be right unless the resultant parse is implausible at which point the speaker backtracks to the more plausible but less common interpretation.

    Regarding the number of words ‘needed to invent gunpowder,’ my colleague C-C. Cheng has written a paper claiming that the limit is about 10,000 words for any language.

  35. katarina said,

    January 17, 2024 @ 3:00 pm

    Here is a Chinese poem by Qiao Ji (乔吉) of the Yuan dynasty consisting of a pile-up of images. Every character is duplicated playfully. No verbs. Nightingales and swallows suggest loving pairs.

    莺莺燕燕春春
    ying ying yan yan chun chun
    nightingales nightingales swallows swallows spring spring

    花花柳柳真真
    hua hua liu liu zhen zhen
    flowers flowers willows willows beauties beauties

    事事风风韵韵
    shi shi feng feng yun yun
    always always charming charming

    娇娇嫩嫩
    jiao jiao nen nen
    delicate delicate tender tender

    停停当当人人
    ting ting dang dang ren ren
    fine fine proper proper persons persons

    "Ren ren" (persons persons) means "every one of them".

  36. Jarek Weckwerth said,

    January 17, 2024 @ 3:08 pm

    @ Jerry Packard: Thank you for confirming my suspicions!

    I think it's worth noting that frequencies are driven by semantics, so it's fair to say that that is the driving force.

    This has been the best discussion I've had on here in a long time!

    Regarding the 10,000 words thing: the limit for what?

  37. Jerry Packard said,

    January 17, 2024 @ 3:47 pm

    ‘Regarding the 10,000 words thing: the limit for what?’

    My mistake – 8,000 words

    Addressing the possible upper limit on the number of words in a language, C.-C. Cheng (1998) has proposed that the human lexicon has a de facto storage limit of approximately 8,000 elements. Cheng, C.-C. (1998). Quantification for understanding language cognition, in Ts’ou, B.K., Lai, T.B.Y., Chan, S.W.K., and Wang, W.S.-Y. (eds.), Quantitative and Computational Studies on the Chinese Language. Hong Kong: Language Information Sciences Research Center, City University of Hong Kong.

  38. Kingfisher said,

    January 17, 2024 @ 4:22 pm

    I don't think you are going to find any hard-and-fast linguistic rules in which 西儒耳目資 will always indicate "Aid to the Eyes and Ears of Literati", you just have to infer things from the social context. 西儒耳目資 is concise in a way that "Xīrúěrmùzī" could never be, because the former has visual information in the characters that the latter does not (too homophonous); perhaps you could ape a "Japanese" version by writing the title as:
    西stern儒larsfor耳and目資

    Considering that the title was written by a missionary, I am tempted to assume they used "eyes and ears" as a subtle nod to Jesus' command that "he who has eyes to see, let him see; he who has ears to hear, let him hear".

  39. Kingfisher said,

    January 17, 2024 @ 4:40 pm

    We might well look at one of the opening sentences of Romance of the Three Kingdoms for something similarly interpretive:

    話說天下大勢
    分久必合
    合久必分

    The meaning that the author is conveying is:

    Let it be said that there is a general way of the world:
    That whatever has been long divided is bound to unite
    And whatever has been long united is bound to divide

    But literally what they are saying is:

    SAY/TALK/HEAVEN/UNDER/BIG/SITUATION
    SPLIT/LONG/MUST/UNITE
    UNITE/LONG/MUST/SPLIT

    And even those words are more precise than the original characters because I have arbitrarily chosen to interpret some of them as verbs (SPLIT, UNITE) rather than as nouns (DIVIDED, TOGETHER). What is the framework to get to the meaning from the literal interpretation and for deciding why all those "that"s and "there"s and "is"s? I'm not sure how to answer that.

  40. Philip Anderson said,

    January 17, 2024 @ 4:56 pm

    Firstly, thank you to Professor Mair for following up the previous discussion. As he said, there have been many threads relating to Classical/Literary Chinese here on Language Log, but it’s useful for an amateur like me to have a “worked example”.
    Regarding literary versus Classical language, I would say a literary language is any variety accepted for writing that differs from the spoken language; classical was originally used to describe Greek and Latin in their heyday, when an extensive literature was produced that is still studied, but the term has been extended to describe (a particular phase of) certain other (literary) languages: e.g. Arabic, Sanskrit, Chinese. These continued to be read and written long after they diverged from, or outlived the spoken language.
    Since a literary language tends to be read rather heard (although Welsh cynghanedd are composed to be heard), texts can be more complex; although it appears to me that Classical Chinese is particularly difficult.

  41. Jerry Packard said,

    January 17, 2024 @ 5:35 pm

    @Kingfisher
    But why must we decide anything about all those "that"s and "there"s and "is"s ? They are totally English, and there’s no reason to represent them except for modern translation.

    And the form class (part of speech) of 分and 合 is also only relevant for we speakers of modern English – native speakers simply interpret it as written – there is no reason to think in terms of V or N.

  42. AntC said,

    January 17, 2024 @ 7:19 pm

    Since a literary language tends to be read …, texts can be more complex

    So the pleasure from reading is more like solving a cryptic crossword?

  43. Victor Mair said,

    January 17, 2024 @ 10:53 pm

    The philological and pedagogical methods of William Hung (Hóng Yè 洪業), are mentioned in the first comment to this post.

    Hung was the editor of the renowned "Harvard-Yenching Institute Sinological Index Series", when he was professor and dean at Yenching University in Peking during the 1920s, used to have his students translate LS/CC into English to make sure that they understood the texts they were reading.

    "How to teach Literary Sinitic / Classical Chinese" (9/6/18)

    "The importance of translation for learning Literary Sinitic" (6/27/21)

    "The value and validity of translation for learning classical languages" (12/9/22)

  44. Victor Mair said,

    January 17, 2024 @ 11:12 pm

    A couple of telling anecdotes about Hung.

    When he told his father, a classical scholar, that he was compiling the H-Y index series, and that translation into English was one of the methods he employed for teaching his students to read CC/LS, his father said to him, "Now any Tom, Dick, and Harry will be able to read the classics."

    Hung proudly proclaimed that there was only one person on earth who could read and understand his introduction to the Chronicle of Zuo (late 4th c. BC)..

  45. Kingfisher said,

    January 17, 2024 @ 11:32 pm

    "Now any Tom, Dick, and Harry will be able to read the classics."

    As it should be. Within twenty years, all the dynastic histories will have English translations.

  46. Thomas said,

    January 18, 2024 @ 1:14 am

    All this discussion about the parsing of 西儒耳目資 reminds me of how profoundly confusing yet oddly satisfying I found the concepts of genitivus objectivus and genitivus subjectivus when I was learning Latin in school. “Amor matris” is a phrase up to interpretation with respect to who loves whom. When I learned Mandarin, I had the same feeling with unmarked passive constructions. However uneasy a logically thinking scholar might feel with this, apparently, these things work well in practice, but only because most communication is not that dense in unusual information. Hasn't the saying popped up on languagelog before that to be able to translate from Classical Chinese, one needs to know the meaning in advance?

  47. Philip Taylor said,

    January 18, 2024 @ 5:11 am

    "William Hung (Hóng Yè 洪業)" — Interesting, in that the phonological shift is the opposite of my uncle-by-marriage's name — his Vietnamese name is Vĩnh Hùng Lý, he is universally known as Hùng, but the vowel is far closer to /ɒ/ than to /ʌ/ or /ʊ/.

  48. Jonathan Smith said,

    January 18, 2024 @ 8:19 am

    FWIW, there is way too much going on above to be sensibly treated together as "Classical/Literary Chinese" —

    Written language of the "classical" period proper is relatively coherent wrt lexicon and esp. syntax — one could and should approach this language systematically and call it "Classical Chinese" or sth.

    Poetry of (say) Tang is a pseudo-popular form… it's full of vernacular elements and is relatively constrained wrt lexicon — IMO not "Literary Chinese" in the strict-ish and thus meaningful sense

    In my comments I was thinking of literary prose like that on display in (say) the intros to the book in question — here the "detritus" metaphor is useful for thinking about what does/can go on (that is to say, anything). So Kingfisher is right to say that there are no "hard-and-fast linguistic rules in which 西儒耳目資 will always indicate 'Aid to the Eyes and Ears of Literati'". Here and in similar cases it is easy to be fooled by the fact that everyone knows what this is supposed to mean; try working with similar material without the cheat code.

    But why the Romance of the Three Kingdoms example? This is not only highly vernacular but practically modern; highly approachable for the modern curious 12 yr old. Not "Literary Chinese" in the above sense.

  49. Kingfisher said,

    January 18, 2024 @ 12:44 pm

    @Jerry Packard
    I do not mean to say that the "that"s and such need to be represented in the writing of the original text, but the text does have a correct meaning in mind and those functions do have to factor into it, if only within the head of the person reading the text and parsing its meaning. I do not say that the reader has to specifically classify 分 or 合 as "verb" or "noun", but they do have to make a judgment as to what 分 and 合 mean in this instance from several possibilities.

    @Jonathan Smith
    You are presumably referring to the intros of the 西儒耳目資, like the 刻西儒耳目資序? Are you saying there is a difference in comprehension for those intros only by degree or by form? That is, whether understanding the intros would require specialized study which the Romance does not?

    字韻之學,非雕蟲埒也。三才之蘊,性命道德之奧,禮樂刑政之原,皆繫于此。宋司馬君實有云:「備萬物之體用者,莫過于字;包衆字之形聲者,莫過于韻。」誠重之矣。

    "The study of script and pronunciation is no mere hobby like engraving or cursive; the subtleties of the Three Elements (of Heaven, Earth, and Humanity), the intricacies of instinct and will or of self-conduct and force of personality, and the origins of rites and music or of governance and justice are all bound up in it. Well did Sima Junsi of Song (Sima Guang) appreciate it when he remarked that 'For organizing the form and purpose of all things, nothing surpasses script; for encompassing the full shape and sound of script, none is greater than pronunciation'."

  50. Jarek Weckwerth said,

    January 18, 2024 @ 1:21 pm

    @Jonathan Smith right to say that there are no "hard-and-fast linguistic rules in which 西儒耳目資 will always indicate 'Aid to the Eyes and Ears of Literati'"

    OK, so what's your opinion on the sub-discussion we had above with Jerry Packard re frequencies? Would you say it's OK to say there are no syntactic rules but there are — I dunno — suggestive semantic tendencies? (As in, "eye ear" is a more frequent bigram than "scholar eye" and therefore gets parsed together. BTW, has there been corpus research into Classical Chinese?)

  51. Jerry Packard said,

    January 18, 2024 @ 1:24 pm

    @Kingfisher
    “the text does have a correct meaning in mind and those functions do have to factor into it”

    I see no reason why.

    “they do have to make a judgment as to what 分 and 合 mean in this instance from several possibilities”

    Again I see no reason why.

  52. Jonathan Smith said,

    January 18, 2024 @ 6:23 pm

    @Kingfisher
    "Are you saying there is a difference in comprehension for those intros only by degree or by form?"

    The simple path to good writing is surely > have good ideas > express them in language via writing such that your audience may congress w/ them. Does the intro you quote above have ideas in it? IDK. I sense, as I tend to given such texts, that the author has read a lot of literary Chinese but not thunk much, yielding a ChatGPT-like Chinese Culture salad expressing the profundity that "script (?) and pronunciation (?) are like bodacious." IMO we don't even need to check down to the question of whether say "三才之蘊,性命道德之奧" really means (?) "the subtleties of the Three Elements (of Heaven, Earth, and Humanity) [and] the intricacies of instinct and will or of self-conduct and force of personality". So ya miles from "話說天下大勢,分久必合,合久必分" 'we hear say that Creation's great ebb and flow is given long rupture inexorably towards reunion and then back again' or however you prefer to translate — a nice clear idea due in no small part to its expression in terms of Actual Words People Say.

    @Jarek Weckwerth
    "OK, so what's your opinion on the sub-discussion we had above with Jerry Packard re frequencies?"
    I didn't have the question of frequency per se in mind TBH — just (some) reasons for ambiguity/unclarity in such texts. Here I will just make the simple point that, e.g., zi1 資 is not really an item of modern Chinese languages AFAIK. If you check sensible dictionaries (checking…), you find statements like "單字不成詞者 ,無義項" 'single characters/syllables/morphemes which do not (alone) constitute lexical items do not have entries' (sutian.moe.edu.tw). If you check a not-sensible dictionary (i.e. basically all others… checking…), it will provide what appears to be a definition but which has been generated via an inexplicit process of reverse engineering on actual lexical items often both modern and historical which contain the morpheme in question, thus in this case "[1] [n] property; wealth; capital; [2] [n] expenses; fees; charges; [3] [v] provide; supply; support; [4] [n] endowment; aptitude; natural ability; [5] [n] qualifications; record of service" (http://www.cantonese.sheik.co.uk).

    So one might think that when one studies an early Chinese text and encounters this item, one carefully checks how it is used / what it means in closely contemporary texts. But in fact not; generally one instead relies subconsciously on just the same reverse engineering process as is illustrated above. And in the case of a text like the one at hand (unlike with Classical Chinese material proper), there is every chance in the world that the author has chosen the "word" in question by these very same nebulous means.

    So TL;DR, zi1 here could mean any of the above listed things depending. I don't see why, to give one reasonable possibility, the title can't be understood as referring to a 'contribution from the POV of a Western literatus'. But IDK.

    Yes people have done corpus linguistics work on (subsets of) at least (the relatively coherent body of) pre-Qin literature. Tools like ctext.org and now many others have made this much more practicable than before. Don't know how much of value has been gleaned thus far though TBH…

  53. Jerry Packard said,

    January 18, 2024 @ 8:12 pm

    “ zi1 資 is not really an item of modern Chinese languages”

    Surely you jest (资本, 投资, 资源, 资料 etc)

  54. Jerry Packard said,

    January 18, 2024 @ 8:35 pm

    Oh I see, you mean as a free morpheme (word).

    But, we really don’t know if it was a free morpheme in the 15th C either.

  55. Victor Mair said,

    January 18, 2024 @ 9:17 pm

    Cf. the famous Song period synthetic history, Zī zhì tōng jiàn 資治通鑑 (lit., ["aid governance comprehensive mirror"] Comprehensive Mirror for Aid in Governance), 1084 AD. See how things get shifted around.

    https://en.wikipedia.org/wiki/Zizhi_Tongjian

  56. Jarek Weckwerth said,

    January 21, 2024 @ 12:45 pm

    I think Mark Liberman's recent garden path post is a timely follow-up to this discussion:

    https://languagelog.ldc.upenn.edu/nll/?p=62294

  57. David Cowhig said,

    January 23, 2024 @ 1:33 am

    I have enjoyed experimenting with ChatGPT 3.5 for translation. From what little I understand of Large Language Models it works by analogy on a large corpus without actually 'understanding anything" so probably true AI doesn't exist. I do find that with some prompting it can do a bit better than without. For example:

    User:
    Translate into English: 西儒耳目資
    ChatGPT
    The Chinese phrase "西儒耳目資" translates into English as "Western Confucian Resources" or "Western Confucian Information."

    User:
    Translate from Classical Chinese into English: 西儒耳目資
    ChatGPT
    The Classical Chinese phrase "西儒耳目資" can be translated into English as "Western Confucian auditory and visual resources" or "Western Confucian information gathered through the ears and eyes."

    The second one is not bad, though would have been better with western literati rather than Western Confucians.

    In my own humble experiments, the less concise classical Chinese text with some context come out better. Since the large language model 'AIs" lack understanding, it lacks the ability to 'see' alternative formulation. Perhaps working from analogy within the corpus it has been given, it picks out the highest frequency alternatives.
    One of my experiments: "2023: ChatGPT3.5 Classical Chinese Translation Corrected"
    https://gaodawei.wordpress.com/2023/09/30/2023-chatgpt3-5-classical-chinese-translation-corrected/

  58. Student said,

    February 6, 2024 @ 9:43 am

    @Jerry Packard
    I can't find the source you cited, either on Google or my college's library database:
    Cheng, C.-C. (1998). Quantification for understanding language cognition, in Ts’ou, B.K., Lai, T.B.Y., Chan, S.W.K., and Wang, W.S.-Y. (eds.), Quantitative and Computational Studies on the Chinese Language. Hong Kong: Language Information Sciences Research Center, City University of Hong Kong.

    Is there another way to access this source? I would love to read the proposal, since the implications seem profound to me. If there is no way to access the proposal, how did your colleague arrive at the 8,000 word figure? Is the figure of a passive or active vocabulary? Does it include lemmas and affixes?

  59. Jerry Packard said,

    February 7, 2024 @ 8:56 am

    @student
    I’m afraid I divested myself of most academic books and papers, but you should be able to get it from Language Information Sciences Research Center at the City University of Hong Kong. Or you can try C-C himself at the U of Illinois.

    I don’t remember his methodology, but I think it was passive as well as active vocabulary, and included lemmas but not affixation.

  60. Victor Mair said,

    February 8, 2024 @ 10:41 am

    Thanks to Jerry Packard for the helpful reply.

    I also discussed this issue in "Lexical limits" (12/5/15) https://languagelog.ldc.upenn.edu/nll/?p=22655

    In that post, I cited Prof. Packard and Prof. Cheng a couple of times. That was followed by a vibrant discussion with many useful comments and valuable references, including a lengthy, detailed account of Prof. Cheng's work — with facts, figures, and quotations — by Jerry himself.

    This is essential reading for anyone interested in Prof. Cheng's studies on lexical limits.

RSS feed for comments on this post