Spoken vs. written Sinitic

« previous post | next post »

The gap between spoken and written Sinitic is enormous.  In my estimation, it is greater than for any other language I know.  The following are some notes by Ľuboš Gajdoš about why this is so.

"The Discrepancy Between Spoken and Written Chinese — Methodological Notes on Linguistics", Comenius University in Bratislava, Department of East Asian Studies

The issue of choosing language data on which synchronous linguistic research is being done appears in many ways not only to be relevant to the goal of the research, but also to the validity of the research results. The problem which particularly concerns us here is the discrepancy between speech on the one hand and written language on the other. In this context, we have often encountered in the past a situation where the result of the research conducted on a variety of the Chinese language has been generalized to the entire synchronous state of the language, i.e. to all other varieties of the language, while ignoring the mentioned discrepancy between the spoken and written forms. The discrepancy between the spoken and written forms is likely to be present in any natural language with a written tradition, but the degree of difference between languages is uneven:  e.g. compared to the Slovak language, it may be stated that the situation in Chinese is in this respect extraordinary. Nevertheless, it is surprising that the quantitative (qualitative) research on discrepancies between different varieties of the language has not yet aroused the attention of Chinese linguistics to such an extent as would have been adequate for the unique situation of this natural language.

Considering the above mentioned notes, modern Chinese is differentiated into spoken language, kǒuyǔ 口語, and written language, shūmiànyǔ 書面語. Both terms are defined very vaguely in Chinese linguistics, so let us use at least the definition of kouyu which is presented in a university textbook for students of  Chinese language: »Kouyu is the oral language used by people having a colloquial style.« Shumianyu is then »a language recorded in Chinese characters and constituted by kouyu. It can be easily re-cultivated until it becomes very concise, rigorous, compact, and thus has a different style than kouyu.« Kouyu is usually translated into foreign languages with an equivalent — ‘spoken language’, occasionally ‘colloquial style’*. Shumianyu is generally translated as ‘written language’, less commonly as ‘literary language’. Shumianyu is, inter alia, lexically characterized by an abundance of archaisms, idioms and sayings. 

*For more information, see William Hannas, Asia’s Orthographic Dilemma (Honolulu: University of Hawaii Press, 1997), p. 248

The author concludes:

If there was a corpus of spoken language with adequate parameters, we may then draw outlines of different varieties of language. But not only that: it may serve as a fixed point of reference that reflects the current state of language as well as a basis to which future states of language may be compared in terms of lexical change and syntactical patterns. All these elements could also help to study the stratification of the Chinese language, a field of linguistics that deserves more attention as until now only a fraction of stratification has been examined.

Besides linguistic research, description is not the goal in itself, but rather a means for a better picture of shumianyu (supported by statistical data) as well. It should result in a grammar (or dictionary) of shumianyu which is needed in second language acquisition nowadays. Thus, students (primarily not native speakers) of Chinese would have a better opportunity to master both the written and spoken forms.

This contribution should be seen as a stimulus for further research on the spoken language rather than as an answer to questions concerning the discrepancy between kouyu and shumianyu.

In addition to all that the author says above, I maintain that spoken and written Sinitic are bound to differ so vastly because of the nature of the non-phonetic writing system.  Sinographs simply are poorly suited for recording the sounds of spoken language.  Moreover, because of their semantically heavy nature, they inevitably emphasize meaning over sound.

 

Selected readings

[h.t. John Rohsenow, Martin Schwartz]



7 Comments »

  1. john rohsenow said,

    December 13, 2024 @ 2:01 am

    Perhaps written 'baihua' can more accurately described as 'ban bai–ban wen', still retaining many features of the older pre- May 4th Movement 'wen-yan'.

  2. Peter Grubtal said,

    December 13, 2024 @ 4:17 am

    "greater than for any other language" :
    How about Arabic? Perhaps it depends on the country, and whether you exclude written forms of colloquial speech, and only the difference between that and formal written Arabic (MSA).

  3. Andreas Johansson said,

    December 13, 2024 @ 8:23 am

    How does one quantify a distance between spoken and written language? I mean, there are several dimensions along which they could differ – the spelling could poorly reflect the pronunciation, there could be differences in vocabulary, in syntax, or morphology, etc. – and it's not obvious how much you compare so much syntactic difference to so much difference in vocabulary.

  4. Lasius said,

    December 13, 2024 @ 8:53 am

    I also don't understand what is meant by this distance. In theory you could write a "Chinese" sentence that could also be understood in Japanese, which is an entirely unrelated language, disregarding the copious loans. But it is purely due to the "non-phonetic" writing system, that you could theoretically substitute any arbitrary pronunciation as long as the meaning of the word is roughly the same.

    In light of this, is there any real distance to written Sinitic whether 蜜 is pronounced mi4, mat6 or mig8?

  5. Terry K. said,

    December 13, 2024 @ 10:08 am

    It strikes me there's an aspect of the standard language vs written language that interrelates. There's the standard written language, there's the standard language in spoken form, and then there's the language as commonly spoken. In Spanish and German (to name a couple languages I'm familiar with) it's pretty straight forward to read out the written language phonemically and get a standard spoken language. (With some exceptions; numbers, and borrowed words not respelled come to mind.)

  6. Doctor Science said,

    December 13, 2024 @ 12:33 pm

    I'm currently watching a Chinese drama that would provide some good course material for those of you what are teaching about this topic. I'm watching the 2017 PRC series of Legend of the Condor Heroes, which is a remake of the 1983 Hong Kong series. There are some changes to the plot & script, but often you can find the same scenes pretty much word-for-word in the two series–except the words sound *very* different. It's especially noticeable because the 2017 series often uses the same music as the 1983 series, so the differences in the words really stand out.

    A third point of comparison (which I'm not competent to make) would be the book on which both series are based. I gather the book is written in a version of shumianyu which is "middlebrow": with a lot of emotions & fighting, but also a notable amount of chengyu and other "highbrow" literary techniques.

  7. Jonathan Smith said,

    December 13, 2024 @ 8:08 pm

    Yeah as others have pretty much said, comparing spoken language to written text is always to some extent apples and oranges, with the (to large degree) logographic representational level of Chinese-type scripts making this discrepancy particularly salient.

    For closer to apples to apples, one could compare e.g. spoken modern standard Mandarin to MSM vocalizations of written texts. If say your typical novel / news story / scientific article, the differences would be considerable but probably not more so than observed with English — we might call this Mandarin-linked (but also rather classicizing [or "un-declassicized"?]) written idiom "modern standard baihuawen" or something. Whereas with other Sinitic languages (Cantonese, Taiwanese…), there exist various species of variously standardized contemporary writing which, when vocalized, will range from relatively near to extremely distant from the spoken vernaculars of the same communities. Here the more distant species tend to share a great deal with "modern standard baihuawen" above in *orthographical* terms.

    If we're talking about vocalizations of wenyanwan or classical texts via conventions borrowed from some modern system, we're still more on different planets…

RSS feed for comments on this post

Leave a Comment