According to David Brooks, "Harmony and the Dream", NYT, 8/11/2008:
The world can be divided in many ways — rich and poor, democratic and authoritarian — but one of the most striking is the divide between the societies with an individualist mentality and the ones with a collectivist mentality.
This is a divide that goes deeper than economics into the way people perceive the world. If you show an American an image of a fish tank, the American will usually describe the biggest fish in the tank and what it is doing. If you ask a Chinese person to describe a fish tank, the Chinese will usually describe the context in which the fish swim.
These sorts of experiments have been done over and over again, and the results reveal the same underlying pattern. Americans usually see individuals; Chinese and other Asians see contexts.
When the psychologist Richard Nisbett showed Americans individual pictures of a chicken, a cow and hay and asked the subjects to pick out the two that go together, the Americans would usually pick out the chicken and the cow. They’re both animals. Most Asian people, on the other hand, would pick out the cow and the hay, since cows depend on hay. Americans are more likely to see categories. Asians are more likely to see relationships.
Those who've followed our previous discussions of David Brooks' forays into the human sciences ("David Brooks, Cognitive Neuroscientist", 6/12/2006; "David Brooks, Neuroendocrinologist", 9/17/2006) will be able to guess what's coming.
In this case, Mr. Brooks has taken his science from the work of Richard E. Nisbett, as described in his 2003 book The Geography of Thought: How Asians and Westerners Think Differently and Why, and in many papers, some of which are cited below. I was familiar with some of this work, which has linguistic aspects, and so I traced Brooks' assertions to their sources. And even I, a hardened Brooks-checker, was surprised to find how careless his account of the research is. The relation between Brooks' column and the facts inspired me to model my discussion after the Radio Yerevan jokes that arose in the Soviet Union as a way to mock the pathetically transparent spin of the Soviet media:
Question to Radio Yerevan: Is it correct that Grigori Grigorievich Grigoriev won a luxury car at the All-Union Championship in Moscow?
Answer: In principle, yes. But first of all it was not Grigori Grigorievich Grigoriev, but Vassili Vassilievich Vassiliev; second, it was not at the All-Union Championship in Moscow, but at a Collective Farm Sports Festival in Smolensk; third, it was not a car, but a bicycle; and fourth he didn't win it, but rather it was stolen from him.
Question to Language Log: Is it correct that if you show an American an image of a fish tank, the American will usually describe the biggest fish in the tank and what it is doing, while if you ask a Chinese person to describe a fish tank, the Chinese will usually describe the context in which the fish swim?
Answer: In principle, yes. But first of all, it wasn't a representative sample of Americans, it was undergraduates in a psychology course at the University of Michigan; and second, it wasn't Chinese, it was undergraduates in a psychology course at Kyoto University in Japan; and third, it wasn't a fish tank, it was 10 20-second animated vignettes of underwater scenes; and fourth, the Americans didn't mention the "focal fish" more often than the Japanese, they mentioned them less often.
The research in question was reported in T. Masuda and R.E. Nisbett, "Attending holistically vs. analytically: Comparing the context sensitivity of Japanese and Americans", J. Pers. Soc. Psychol. 81:922–934, 2001.
The subjects were 36 Americans at the University of Michigan and 41 Japanese at Kyoto University, who "participated in the experiments as a course requirement". The subjects each watched 10 animated vignettes of underwater scenes, where "Each scene was characterized by having 'focal fish,'which were large and had salient colors and shapes, moving in front of a complicated scene".
After a vignette was presented twice, the subject was asked “What did you see in the animation? Please describe it, taking as much as 2 min.” The subjects' oral responses were recorded, transcribed and coded.
The data were coded as belonging to one of the following categories: (a) focal fish, (b) background fish, (c) active animals, (d) inert animals, (e) plants, (f) bubbles, (g) floor of scene, (h) water, and (i) environment. … The categories were grouped into four superordinate categories. Focal fish remained an independent category. Background fish and active animals were grouped and named active objects, representing peripheral but moving objects. Inert animals and plants were categorized as inert objects. Finally, bubbles, floor of scene, water, and environment were categorized as background.
Here's a table of the results:
Note that each Japanese subject actually mentioned one of the focal fish, on average, 130.32 times in describing 10 vignettes — an average of 1.3 "focal fish" mentions per vignette — while each American subject mentioned one of the focal fish merely 117.91 times on average across 10 vignettes.
Now, to be fair to David Brooks, the authors found another way of looking at the data that accords better with what they hoped to prove. If you collapse everything down to two categories — basically the moving stuff and the stationary stuff — and look only the subject of the first sentence of each description, you find that the Americans did feature the "salient objects" somewhat more often, on average, while the Japanese did feature the "field" more often.
This research is described on pp. 89-90 of Nisbett's book; as far as I can tell, the experiment has never been replicated with Chinese or other Asian subjects (though if I'm wrong, please feel free to tell me in the comments).
Question to Language Log: Is it correct that when the psychologist Richard Nisbett showed Americans individual pictures of a chicken, a cow and hay and asked the subjects to pick out the two that go together, the Americans would usually pick out the chicken and the cow, since they're both animals, whereas most Asian people would pick out the cow and the hay, since cows depend on hay?
Answer: In principle, yes. But first of all, it wasn't the psychologist Richard Nisbett, but rather the psychologist Lian-Hwang Chiu. And second, it wasn't a representative sample of Americans and Asians, but rather some fourth and fifth graders from rural Indiana and rural Taiwan. And third, the American kids didn't usually make their choices on the basis of a shared named category like "animal", but rather they did this about 18% of the time, on average, while the Chinese kids did it about 12% of the time. And fourth, the Chinese kids didn't usually make their choices on the basis of functional interdependence, but rather they did this about 43% of the time, on average, while the American kids did it about 21% of the time.
The source is Lian-Hwang Chiu, "A cross-cultural comparison of cognitive styles in Chinese and American children", International Journal of Psychology 7: 235-242, 1972.
The subjects were "221 fourth and fifth grade children selected from the rural communities in the northern part of Taiwan" and "316 children of same grades sampled from the rural districts in the north-central part of Indiana". They were given a "28-item cognitive style test" where
Each item consisted of three pictures representing human, vehicle, furniture, tool, or food categories. The task for the subject was to select any two out of the three objects in a set which were alike or went together and to state the reason for his choice.
Each response was classified into one of the following categories:
(1) Descriptive — similarities identified on the basis of manifest objective attributes.
(a) Descriptive-analytic — responses denoting observable parts of an item; e.g., classifying human figures together "because they are both holding a gun".
(b) Descriptive-whole — similarities based on the total objective manifestations of the stimuli; e.g. grouping human figures together "because they are small".
(2) Relational-contextual — similarities identified on the basis of functional or thematic interdependence between the elements in a grouping; e.g., classifying human figures as similar "because the mother takes care of the baby."
(3) Inferential-categorical — similarities identified on the basis of inferred characteristics of the stimuli.
(a) Functional — items are grouped on the basis of inferred use; e.g., a saw and an ax are grouped together "because these are things to cut".
(b) Class-naming — classification on the basis of class membership; e.g., and apple and a banana are selected "because they are fruits."
(c) Attribute selection – items classified on the basis of an inferred or non-manifest attribute; e.g., a boat and a jeep are grouped together "because they both have a motor".
(d) Location — inference as to where an item belongs geographically; e.g., a cow and a horse are grouped together "because they both live on a farm".
And here are the results (click on the table image for a larger version):
This is the only way the results were scored — there is no separate tally that would correspond to Brooks' statement that
… the Americans would usually pick out the chicken and the cow. They’re both animals. Most Asian people, on the other hand, would pick out the cow and the hay, since cows depend on hay.
This research is discussed on pp. 140-141 of Nisbett's book. As far as I can tell, this particular experimental paradigm has never been replicated with any other populations of Americans or Asians. (Again, better information is welcomed in the comments.)
Exercise for the reader: Determine how much overlap there probably was between Chinese and American kids in each of Chiu's coding categories, based on the means and standard deviations given. Express your answer in terms of the probability that a randomly chosen Chinese kid would give more/fewer answers of a given type than a randomly chosen American kid.
You'll find that some of the differences are fairly large, in this way of looking at things, though there is still quite a bit of overlap.
Question for Language Log: Is it correct that these sorts of experiments have been done over and over again, and the results reveal the same underlying pattern. Americans usually see individuals; Chinese and other Asians see contexts.
Answer: In principle, yes. But first of all, depending on how the experiments are done and who the subjects are, the effects are sometimes small or non-existent; and second, in some of the experiments, young Asians show as much tendency to categorize as Americans, or even more; and third, "Asian" behavior is found in some other groups of subjects, for example working-class Italians; and fourth, nearly all the experiments are done with words, and when you test bilingual Chinese subjects in both English and Chinese, about half the effect sometimes goes away in the English version of the test. [And, I should add, an increased disposition to group things categorically (or "abstractly") rather than thematically (or "concretely") has been implicated in the world-wide trend towards higher IQ scores known as the "Flynn effect".]
I'll take a look at three papers.
Let's start with Angela Gutchess, Carolyn Yoon, Ting Luo, Fred Feinberg, Trey Hedden, Qicheng Jing, Richard E. Nisbett, Denise C. Parke, "Categorical Organization in Free Recall across Culture and Age", Gerontology 52(5): 314-323, 2006.
Young adults (ages 18–22) and elderly adults (ages 60–78) were recruited and tested in Ann Arbor, Mich., USA, or Beijing, China, with 32 participants in each of the groups. […]
Two lists of 20 words were created: one consisting of unrelated words, and the other consisting of words related by category. For the related list, five items were drawn from each of four categories (fruits, internal organs, times of day, chemical elements) deemed to be equivalent across age and culture. The strongest exemplars (e.g. ‘apple’ for the category ‘fruit’) were avoided to prevent correct guessing of items when participants could not recall items from the related list. Whenever possible, the same category exemplars were included in the American and Chinese lists; however, the response distributions across cultures made it necessary to use different exemplars across cultures in some cases, but the same categories were always used. […]
Participants studied two lists, each containing 20 words arranged in a randomized order for each individual. Words were presented simultaneously aurally, through headphones, and visually, on a computer screen, in the participant’s native language (English or Mandarin Chinese). Words were presented visually for 4 seconds each, with a 1-second blank display before the next item. At the conclusion of the list, there was a 1-min retention interval, filled with a written subtraction by 7’s task. Participants then had 2 min to recall all the words they could remember, in any order, into a tape recorder. The procedure was then repeated for the second word list. The control list always preceded the list of categorized words. This was done to avoid any possible group differences in the expectation of relationships between the control words, which participants may be more prone to search for if they were to encode the related word list beforehand.
But it turned out that neither the young people nor the old people showed any Ann-Arbor-vs.-Beijing differences in recall as a function of categorization:
The authors looked at the data a different way, to quantify the subjects' tendency to recall items in an order that reflected category membership. Here's the paper's description of the analysis:
For the clustering analysis, the number of items generated in succession from a single category was coded. If repetitions occurred within same-category clusters, they were omitted from the clustering analysis, but did not mark a break in the cluster. Adjusted ratio of clustering (ARC) scores are a common measure of clustering, considered suitable across a range of numbers of items recalled. An ARC score of 1 denotes perfect clustering (i.e. all items from a single category are recalled together before moving to the next category), whereas a score of 0 denotes clustering at the level of chance (i.e. item order is ‘zero-order’ or entirely random, given a fixed set of mentioned items). Negative ARC scores (except in rare cases, bounded at –1) are possible when less categorization occurs than by chance (i.e. items from different categories tend to be output adjacent to one another). Comparing across groups using standard statistical techniques requires not only that the means of ARC distributions be the same for any number of mentioned items – and they are, in fact, always zero – but that the standard deviations be the same as well, which they are not. For example, with four categories of five items each, we can compute the ‘by chance’ ARC distribution for any number of mentioned items; for all 20 mentioned, the ARC standard deviation is 0.148, while for 6 items, it is 0.537, or 3.5 times greater. In this way, ARC scores do, in fact, depend upon the number of items recalled: because conditional distributions can differ across groups when recall performance differs, average ARC should not be used for between group comparisons when groups differ on number of items recalled. It is plainly inappropriate to use standard t-type means-based tests, then, on the raw computed ARC score. To address this problem, we transformed the conditional (on number of items recalled) ARC distributions to conform to the same distributional family, one as close as possible to the calculated, empirical ARC distributions. We henceforth refer to these individual level clustering measures as transformed ARC scores.
Here's the result — the young Chinese actually had a higher ARC score than the young Americans, though not significantly so, while the old Chinese had a slightly lower ARC score than the old Americans:
… the young Chinese (mean = 0.90, SD = 0.15) had transformed ARC scores that did not differ significantly from the young Americans (mean = 0.82, SD = 0.28), t (62) = 1.51, p = 0.14, while the elderly Chinese (mean = 0.48, SD = 0.50) had disproportionately lower transformed ARC scores relative to the American elderly (mean = 0.70, SD = 0.31), t (62) = 2.11, p < 0.04.
This essentially negative result surprised the researchers enough that they did the experiment over again:
The surprising disconnect between free recall and categorical clustering found in experiment 1 could reflect the strong categorical associations of the list items. The strong associations could make the categorical organization apparent and evident as a possible strategy, consistent with the finding of high transformed ARC scores. Even though the organizational scheme may not be commonly employed as a spontaneous strategy in some groups, the salience of the manipulation may have contributed to the finding in experiment 1. In order to address this possibility, in experiment 2 we selected words less strongly associated with the categories and assessed memory and categorical clustering in new samples across age and culture. This approach afforded the benefit of a more subtle manipulation that might allow cultural biases to emerge more clearly.
But in fact the results of the second experiment were not very different (click on the image for a larger picture):
This time the cultural difference came out statistically significant when the groups of all ages are treated together, though barely so — the ARC for the Americans were mean = 0.74, SD = 0.33, while for the Chinese they were mean = 0.60, SD = 0.38. They don't tell us what the statistical tests showed for the groups considered separately, but eyeballing the graph suggests that for the young people, the effect of "culture" was still not statistically significant.
The authors are resilient in insisting that these results support the theories of American/East-Asian cultural differences, at least to some extent:
Overall, our results provide some support for the notion that East-Asians use categories as an organizational strategy based on taxonomic categories less than do Americans. Whereas past studies demonstrated this effect in young adults, the difference was demonstrated primarily in elderly adults in the present experiments. The careful selection of words, equated across culture for both familiarity of items as well as overall structure of the category, may have prevented the expression of cultural differences in the young. […]
In conclusion, the present studies suggest that cultural differences in the use of categories as an organizational strategy are more prominent for older adults, with East Asians less likely to organize their recall by category. It is unlikely these effects are due to cohort differences due to careful sampling and similar findings of cross-cultural differences in young adults . Instead, our findings reflect the bias to process information less categorically in East-Asians, with the effect manifest in the older adults who have accrued years of experience in the culture. Despite the overwhelming effects of neurobiological aging, elderly adults express the signature of their culture in their use of cognitive strategies.
An alternative interpretation, which the authors reject because it disagrees with their view of the literature in this area overall, would be that Chinese culture is changing in a direction that makes its effect on this particular task more like the effect of American culture.
For our second paper in support of this joke, let's take a look at Nicola Knight and Richard Nisbett, "Culture, Class and Cognition: Evidence from Italy", Journal of Cognition and Culture 7: 283-291, 2007.
East Asians have been found to reason in relatively holistic fashion and Americans in relatively analytic fashion. It has been proposed that these cognitive differences are the result of social practices that encourage interdependence for Asians and independence for Americans. If so, cognitive differences might be found even across regions that are geographically close. We compared performance on a categorization task of relatively interdependent southern Italians and relatively independent northern Italians and found the former to reason in a more holistic fashion than the latter. Furthermore, as it has been argued that working class social practicesencourage interdependence and middle class practices encourage independence, we anticipated that working class participants might reason in a more holistic fashion than middle class participants. This is what we found – at least for southern Italy.
They tested final-year students in four high schools in Italy. There were two schools with a "classics" orientation, likely to be attended by upper socio-economic status students; one of them was in Milan (in the industrial north of Italy) and the other was in Sicily. And there were two schools of the category known as Istituti Professionali di Stato per l’Industria e l’Artigianato, or "State Professional Institutes for Industry and Crafts", likely to be attended by working-class students. Again, one of the schools was in the north near Milan, while the other was in Calabria, in the south of Italy.
The subjects were given "a printed list of twenty items, each composed of three words (e.g., Monkey, Panda, Banana)", and asked to circle the two that "go together". In seven of the twenty items, the words could be grouped by categorical relations (Monkey and Panda) or by thematic relations (Monkey and Banana). The other 13 items were fillers. "Meaningless" pairings (e.g. Panda and Banana) were ignored. The rest of the responses were coded as 1 (thematic) or 0 (categorical), summed, and divided by seven, to yield a score between 0 and 1, with higher scores reflecting a greater tendency towards thematic answers.
Here are the results:
Figure 1. Proportion of thematic pairings by school and SES. Bars represent ± one standard error.
Thus southern Italians, especially the lower-class ones, are apparently quite Asiatic.
OK, the few of you that are still following along can splash some cold water on your face, pour another cup of coffee, and check out the third and last paper presented in support of our third Radio Yerevan joke: Li-Jun Ji,, Zhiyong Zhang, and Richard E. Nisbett, "Is it culture, or is it language? Examination of language eff ects in cross-cultural research on categorization." Journal of Personality and Social Psychology 87: 57-65, 2004.
This paper brings in our old favorite, the Sapir-Whorf "linguistic relativity hypothesis", the concepts of "compound" vs. "coordinate" bilinguals, and for some other reasons as well is worth a longer discussion that I have the time or patience to give it this morning.
Again, there are two studies reported, probably because the results of the first study were problematic from the researchers' point of view.
In Study 1, the subjects were 119 Chinese students at Beijing University, 131 Chinese students at the University of Michigan, and 43 "European American" students at Michigan, who received course credit. (One thing that worries me about many of these studies is the fact that some of the students, presumably recruited from the researchers' own classes, were familiar with the themes and conclusions of the research, and thus knew how they were "supposed" to respond. This property is not always equally distributed among the "cultural" categories.)
The test was similar to the one used in the Italian experiment just described: participants were shown 20 sets of three words, and asked to indicate which of the two of each set were most closely related, and why.
We used very simple words in the task so that it was easy for bilingual Chinese to understand the task in English. There were 10 sets of test items and 10 sets of fillers. The three words in each testing set could be grouped on the basis of thematic relations, categorical relations, or neither. Participants’ groupings were coded as relational if they suggested an object–context or subject–object relationship, such as monkey and bananas, shampoo and hair, or conditioner and hair. Groupings were coded as categorical if they suggested shared features or category memberships, for example, monkey and panda or shampoo and conditioner. Similarly, participants’ explanations were coded as either relational (e.g., “Monkeys eat bananas”) or categorical (e.g., “Monkeys and pandas are both animals”). Examples for filler items included child–teenager–adult and Monday–Wednesday–Friday.
Within each of the 10 testing sets, there were 3 possible ways for participants to select two items. In total, there were 30 possible ways of grouping, 14 of which were coded as relational (such as policeman and uniform, and postman and uniform) and 11 of which were coded as categorical (such as policeman and postman). Thus, the stimuli were biased toward relational grouping.
The Americans were tested in English, while half of the Chinese were tested in Chinese, and half in English.
Here are the results in graphical form:
The first thing to notice is that none of these subjects are nearly as "relational" as the Sicilian vocational-school subjects were. The Sicilian kids gave a relational answer 85.6% of the time, on average — if the Chinese students had answered that way here, they'd presumably have given a relational answer on 8.56 of the 10 test questions, and a categorical answer on 1.44, on average, yielding a score of 8.56-1.44 = 7.12 (even without allowing to any effect of the test's bias in favor of relational answers). But the highest score for any of the Chinese groups was about 3.5.
Then we can notice what obviously struck the researchers most strongly — testing Mainland and Taiwan Chinese students in English eliminates roughly half of the difference between them and the American students. (The Hong Kong and Singapore students behaved about the same in both languages.) This was also true of the explanations offered by the students for their choices:
The authors offer two different explanations for this effect:
Why does language of testing affect categorization? There are at least two explanations. One could be that structural differences in English and Chinese lead to different reasoning styles, such that certain features of the Chinese language make people think in a relational way whereas certain features of the English language make people think in a categorical way. The other explanation does not pertain to language per se. Instead, it is likely that the language used for a task makes certain ways of reasoning more accessible by activating representations that are common in a particular culture.
It also seems possible that something much more superficial is going on: perhaps there is something about the particular set of ten word-triples used in this test, either in their form or in the way they're written, that biases the choice differently in the two languages. Unfortunately, the specific list of words used is among the details that should be reported for such studies, but are not.
The authors also worried that the language effects might be due to some differences in the kinds of students selected for the study. So they did the study again, using 59 Hong Kong University students (52 of whom were women) and 57 Beijing University students (29 of whom were women), and using a within-subjects design, with each subject taking the test twice, once in English and once in Chinese.
The test was also a different one, designed to eliminate the bias in favor of relational groupings: there were 8 test triples that each allowed only one way to be relational and one way to be categorical (at least in the experimenters conception of the relations and categories involved). The results were basically the same:
Note that despite the less intrinsically biased nature of the material, the Mainland subjects in this case showed a much larger bias in the relational direction — a difference of 5 on 8 choices, as opposed to a difference of about 3.5 on 10 choices. This quantitative instability of the results from test to test, even for the same subject pool (in this case students at Beijing University), suggests that the specific choice of test words makes a big difference. Given this apparently very large effect, the quantitative comparison of results across languages seems very problematic to me. This also underlines how unfortunate it is that the norms of this discipline apparently don't include publishing a complete list of the materials used in an experiment.
Overall, it's certainly clear that there's something going on here, and that it has something to do with a differential propensity to group words and pictures by taxonomic categories as opposed to by functional relationships. But the group differences are generally moderate and quantitatively erratic, and I worry about the apparent commitment of the researchers in this field to maintain their initial conceptual framework, no matter how the experiments come out.
As for David Brooks, he wants to use this stuff as the scientific foundation for the hypothesis that western societies are fundamentally and essentially individualist while Asian societies are fundamentally and essentially collectivist. That might be true, but it's a long and winding road to that conclusion from the complex and equivocal results of various experiments on how people group various triples of words and pictures, or describe undersea scenes. And we should be wary of following David Brooks too far down that road, given that he can't be bothered to keep straight who did which experiments, or whether the subjects were Chinese or Japanese, or whether it was the Americans or the Asians who more often mentioned the focal fish, or essentially any of the evocative details that he loves to use to bring his ideas to life for his readers.