Federico Escobar pointed me to an essay by David Brooks, "The 2016 Sidney Awards, Part I", NYT 12/27/2016:
Perry Link once noticed that Chinese writers use more verbs in their sentences whereas English writers use more nouns. For example, in one passage from the 18th-century Chinese novel “Dream of the Red Chamber,” Cao Xueqin uses 130 nouns and 166 verbs. In a similar passage from “Oliver Twist,” Charles Dickens uses 96 nouns and 38 verbs. […]
Link notes that Indo-European languages tend to use nouns even when verbs might be more appropriate. Think of the economic concept inflation. We describe it as a thing we can combat, or whip or fight. But it’s really a process.
Link takes this thought in a very philosophical direction, but it set me wondering how much our thinking is muddled because we describe actions as things. For example, we say someone has knowledge, happiness or faith (a lot of faith or a little faith, a strong faith or a weak faith); but faith, knowledge and happiness are activities, not objects.
Of course I wondered about this, since David Brooks was post-truth before post-truth was cool (see e.g. "Reality v. Brooks", 6/1/2015). And it's likely to puzzle both philosophers and psychologists to be told that they view faith, knowledge, and happiness as objects.
So I went to the cited essay — Perry Link, "The Mind: Less Puzzling in Chinese?", NYRB 6/30/2016.
And I discovered, somewhat to my surprise, that Brooks reports Link's noun and verb counts correctly:
Wanting to test my intuition that classical Chinese was more verb-heavy than its Indo-European counterparts, I opened Confucius’s Analects and an English translation of Plato’s Apology of Socrates and counted nouns and verbs. Confucius uses slightly more verbs than nouns. Plato uses about 45 percent more nouns than verbs. In search of a more recent example (but still from before the major Western-language influence on Chinese), I chose at random a page from Cao Xueqin’s eighteenth-century novel Dream of the Red Chamber and a page from Charles Dickens’s Oliver Twist. The Cao page had 130 nouns and 166 verbs (a 0.8 to 1 ratio), while the Dickens page had 96 nouns and 38 verbs (a 2.5 to 1 ratio).
I remembered from reading Link's 2013 book (see"Perry Link on Chinese 'rhythm, metaphor, politics'". 5/13/2013) that he does engage the mind-body problem in grammatical terms. In the NYRB article, he wonders whether
people who think in Indo-European languages [are] better off because their languages lead them to clear conceptualization of an important puzzle, or are thinkers in Chinese better off because their language gets them through life equally well without the puzzle?
But the theory about faith, knowledge, and happiness being "activities" is pure Brooksian meta-grammatical invention — in this case resonating with one of his fundamental themes, namely the allegedly profound psychological differences between West and East.
And Link's part-of-speech counts still worry me, for two reasons: First, his sample was small; second, the definitions of noun, verb, and word are problematic in contexts like this.
To explore these problems, I took a quick look at part-of-speech counts in some published English and Chinese treebanks.
The 2015 edition of the Penn Treebank is a revision of the original 1994 Penn Treebank analysis of more than a million words of 1989 Wall Street Journal stories — excluding punctuation, there are 1,030,982 word tokens. Of these,
- 367,124 are (one of the various types of) nouns, or 35.6%
- 157,222 are (are of the various types of) verbs, or 15.2%
So the noun/verb ratio is 2.34 to 1.
But wait — there is also a category of "modal", which is a kind of verb. If we include those, the results are
- 169,232 verbs, or 16.4%
and now the noun/verb ratio is 2.17 to 1.
However, news text in general is pretty noun-heavy, and the WSJ is especially noun-y. So let's compare the 2012 English Web Treebank, whose sources are "weblogs, reviews, question-answers, newsgroups, email". Using the same definitions, this source contains 218,783 word tokens, of which
- 60,737 are nouns, or 27.8%
- 43,826 are verbs (including modals), or 20.0%
for a noun/verb ratio of 1.39 to 1.
The Chinese Treebank 9.0 includes data from newswire, broadcast material, magazine articles, government documents, and web text, comprising 3,247,331 characters divided into 2,084,412 words. Of these
- 559,975 are nouns, or 26.9%
- 423,568 are verbs, or 20.3%
This gives us a noun/verb ratio of 1.32 to 1.
But wait — the "verb" category includes the tag VA, about which the CTB part-of-speech tagging guide says:
VA roughly corresponds to adjectives in English and stative verbs in the literature on Chinese grammars. […] One open question about adjectives is whether they form a subclass of verbs in Chinese. We will not get into that debate.
[Note that the Chinese Treebank also has an adjective category JJ.]
There are 43,668 words marked as VA, or 2.1% — if we remove these from the verb category, we get
- 379,900 verbs, or 18.2%
and the noun/verb ratio becomes 1.47 to 1.
And there's another wrinkle, having to do with the treatment of proper nouns and compound nouns. Thus 新华社 is treated in the Chinese treebank as single word of category "proper noun", whereas the English translation "Xinhua News Agency" is three words, each one separately tagged as a proper noun. Similarly
东南亚 is treated as a single proper noun meaning "Southeast Asia", which would be two proper nouns in an English treebank version.
If we split the Chinese "words", or joined the English "words", the Chinese noun count would rise, or the English noun count would fall.
So adding it all up, we can conclude that within each language, noun/verb ratios vary a great deal depending on the style of the material analyzed, the definition of "noun" and "verb", and the definition of "word". And within those parameters, it's not at all obvious that there's actually any significant difference in nouniness between modern Mandarin Chinese and English.
[I should add that I haven't shown that no relevant difference exists — perhaps if we controlled carefully for genre, register, and grammatical definitions, a nouniness signal would emerge, either as a result of fundamental grammatical differences or as a result of cultural differences in writing style. And perhaps it could be shown that the hypothetical nouniness differences have psychosocial consequences. But so far, this looks to me like a seductive story without any real content — in contrast to the obvious and real differences in determiners, plural marking, and classifiers.]
Here's a final quote from Perry Link's 2013 book An Anatomy of Chinese: Rhythm, Metaphor, Politics, in which he pulls the Whorfian punch that seems to have attracted Brooks' admiration:
[I]t might be that Western languages talk about “entities” rather too much— perhaps thereby creating problems where there needn’t have been problems, or at least not such tough ones. Western philosophers have long wrestled with what we mean by terms like “the good,” “mind,” “reality,” and “existence.” These are nouns, and we might ask how much of Western puzzlement over them has had to do with trying to figure out what “things” they are. In Chinese it is extremely awkward to translate “the good” as a noun; “reality” and “existence” as nouns are marginally more possible, but still are more easily discussed using verbs or other parts of speech. […]
Despite some very interesting contrasts, however, on the whole I found more similarities than differences in comparing the conceptual metaphors of Chinese and English. The puzzles about “before” and “after” as spatial metaphors for time led to very similar answers for the two languages. […] Even something as basic as “high is more” (high level, high octane, etc.) can be seen as having the simple experiential basis that, originally, the more physical objects one puts in a place, the higher a pile becomes. Other theorists have gone further, claiming that it is not just common experience but the hardwiring of the human brain that leads to commonalities in perception. Kant claims this for concepts of space and time, and Chomsky for fundamental grammatical structures.