A month ago, I linked to Lera Boroditsky's WSJ piece "Lost in Translation", and promised to discuss the contents in more detail at some point in the future ("Boroditsky on Whorfian navigation and blame", 7/26/2010). At the time, I noted that there is probably no single linguistic idea that is more prone to exaggeration and mis-application than the "Sapir-Whorf hypothesis" about the relations between language and thought. And the WSJ editors' subhed for Boroditsky's article gives their readers a push down that road:
New cognitive research suggests that language profoundly influences the way people see the world; a different sense of blame in Japanese and Spanish.
Meanwhile, the NYT Sunday magazine has just published a major article by Guy Deutscher, "Does Your Language Shape How You Think?" (8/26/2010), which I hereby promise to discuss in detail at some point in the future. And in order not to let my neo-Whorfianism account fall too many promises in arrears, I'll actually post about Boroditsky's WSJ piece today. (I won't try to discuss both articles at the same time, because in this sort of thing, it's the scientific details that matter.)
So what's the evidence that there's "a different sense of blame in Japanese and Spanish"? Let's take up a representative set of experiments for Spanish first, from Caitlin Fausey and Lera Boroditsky, "English and Spanish speakers remember causal agents differently", CogSci 2008. In my earlier post, I invited you to read this as well as other articles by Fausey and Boroditsky if you're interested in neo-Whorfianism, and I hereby repeat the suggestion.
But when you read scientific articles — especially those dealing with general conclusions about complicated topics — you need to know what to pay attention to. And the best advice that anyone can give you is to ignore the conclusions until you're sure you have a good grasp of the facts.
In Fausey and Boroditsky 2008, the facts come from two experiments. In both cases, the important dependent variable was subjects' memory for "who did it?", asked with respect to each of a series of short video clips. Here's their description of the procedure in Experiment 1:
Videos of 16 unique events were prepared. Three white male adults acted as agents in an intentional version and an accidental version of each event. [The events were things like breaking a pencil, spilling water, dropping keys, etc.] [...]
Learning phase. During the learning phase, participants viewed 16 videos. Each video was a unique event, in which one of two male agents appeared. A man wearing a blue shirt acted as the agent in eight events and a different man wearing a yellow shirt acted as the agent in the other eight events. Each male appeared as the agent in four intentional events and in four accidental events. [...]
Across participants, each particular event featured the same actor as the agent (e.g., the blue-shirt man was always the agent in the balloon popping videos), but half of the participants viewed the intentional version of the event while the other half viewed its accidental counterpart. [..]
Distracter phase. After viewing 16 videos, participants were instructed to count to 10. Pilot testing revealed that a longer delay period between learning and test resulted in chance performance for agent memory in this paradigm.
Test phase. Each trial of the recognition memory test consisted of a probe video followed by a picture of both agents that had appeared during the learning phase (i.e., blue-shirt guy and yellow-shirt guy).
In each probe video, an unfamiliar man wearing a green shirt appeared as the causal agent of the same events that had been presented during the learning phase. For example, if a participant had seen the “accidental balloon popping” event during learning, s/he would see this same event acted by the new agent in the test phase. After viewing the probe video, participants were asked, “Who did it the first time?” (“¿Quién lo hizo la primera vez?”) and responded by mouse-clicking on one of the two agent pictures (either the blue-shirt guy or the yellow-shirt guy).
Before we get to the results, it's important to note something about the experimental design. The set-up has been carefully calibrated to generate interpretable results, i.e. results between the floor (where everyone's memory is at chance) and the ceiling (where everyone's memory is perfect). In order to reach this goal, certain choices were made in designing this experiment. The events (whether intentional or accidental) were banal and inconsequential; the actors were comparable in all characteristics (age, sex, ethnicity, formality of dress) except for shirt color; and the design made appropriate choices of the number of events to remember, the length of the "distractor phase", and the degree of motivation of the participants.
Changing any of these design characteristics might well push the results out of that crucial range above 50% and below 100% on the binary forced choice. There's nothing wrong with this — all psychological experiments have to be calibrated in this way if they're to produce any interpretable results at all. But by the same token, you need to ask about every experiment what sorts of generalization the calibration permits. We'll come back to this point later on.
The test was administered to three groups of subjects, in three different universities:
63 monolingual English speakers (Stanford University), 87 monolingual Spanish speakers (Universidad de Chile), and 38 Spanish-English bilinguals (University of California, Merced) received course credit or were paid for their participation. Half of the bilinguals completed the experiment in English and half in Spanish.
All three groups did about equally well in identifying the original agent in the intentional-event videos: 80.4% for the Stanford students, 78.5% for the Universidad de Chile students, and 75.9% for the UCM students. None of these differences were statistically significant. However, in the case of the accidental-event videos, the Stanford students did "significantly" (in the statistical sense) better (82.3%) than the Chilean students (74.1%) and the UCM students (68.8%).
The first thing to say about these results is that the differences are not very big. Here's a different way to frame the same results:
The Stanford students were 1.9% better at remembering accidental agents than intentional agents; the Chilean students were 4.4% worse, and the UCM students were 7.1% worse. Overall, the Stanford students were about 5.1% more accurate than the Chilean students, and about 9% more accurate than the UCM students [(80.4+82.3)/2 = 81.4, (78.5+74.1)/2 = 76.3, (75.9+68.8)/2 = 72.4].
So as the paper tells us, the Spanish-speaking and bilingual students showed a small but statistically-significant "selective impairment" for memory of the agents of accidental events, whereas the English-speaking students did not.
But there are some issues of description here. First, maybe this is a fact about the schools and not about the languages. Maybe Stanford students just tend to be more oriented toward liability for accidental events than students at Universidad de Chile or University of California Merced tend to be. Of course, this could be tested by doing the same experiment with some other Spanish-speaking vs. English-speaking populations.
The UCM students ought to give us some leverage on the difference between language and culture here, because
Half of the bilinguals completed the experiment in English and half in Spanish.
However, the memory results for the bilingual (UCM) subjects are (oddly) not broken out according to the language in which the experiment was administered — I take it that this means that the results were not significantly affected.
So what's the reason for thinking that language is involved with this at all? Well, the experiment also had an "event description phase":
Each participant described exactly the same videos s/he had seen during the learning phase of the agent memory study. In each description trial, participants viewed a video and were then prompted to answer the question “What happened?” (“¿Qué pasó?”). Participants typed their responses to these questions and received no feedback.
For each event video and for each language, both agentive ("he popped the balloon") and non-agentive ("the balloon popped") descriptions were in principle available. But the three subject groups showed systematic differences in their proportions of agentive and non-agentive descriptions:
So this suggests that the relative salience of agents in subjects' linguistic descriptions is more or less consistent with their ability to remember who the agents were. But still, is this a fact about speaking English vs. speaking Spanish? Or is it a fact about cultural differences in attitudes toward responsibility for accidental events? After all, both agentive and non-agentive descriptions are available to speakers of both languages.
Here the results for the bilingual subjects might help us out, since half of them took the test in English and half of them in Spanish. This still doesn't really distinguish between language and culture, since the choice of language might well prime different cultural patterns (as it did in the experiments described here and here). But in this case, we don't even get quite that far:
Bilingual description patterns did not vary by task language.2 Subsequent analyses therefore considered the bilingual sample as a whole.
2 A trend to use more agentive language when describing accidents in English (M=73.0) than in Spanish (M=63.2), p = .07, suggests that more data may increase power to detect an effect of local linguistic context. Ongoing research will help to address this issue.
Summing up, we have a small difference in memory for agents of accidental events, correlated with a difference in propensity to mention the agent in a description of the event. So far, this seems best described as a small cultural difference in the salience of accidental agents. It leaves open the question of whether the difference in proportion of agentive descriptions is playing any causal role at all in the memory results. So Fausey and Boroditsky did a second experiment, presumably at Stanford, to see whether linguistic priming of agency would affect memory results.
60 English speakers (32 agentive prime, 28 non-agentive prime) received course credit or were paid for participation.
in the priming phase of the experiment,
Participants in each condition listened to 24 sentences, either all agentive (e.g., She burned the toast) or all non-agentive (e.g., The toast burned). No verbs that could describe actions in the agent memory task were used. [...]
While listening to each sentence, participants viewed an image that contained two pictures: the beginning and the end state of the affected object. For example, people who heard She burned the toast or The toast burned viewed a screen with a piece of bread on the left and a burned piece of bread on the right. [...]
Participants were instructed to click on the picture that the sentence described, making the task and correct response identical in each prime condition.
All of the participants were then given the same agent-memory task as in Experiment 1. As predicted, subjects who were primed with agentive sentences remembered agents (whether intentional or accidental) better:
This shows, unsurprisingly, that agentivity is one of the features for which that priming works. (It would have been big news if the attempt to prime it had failed.) Thus it's plausible that what caused the differential performance for the different subject groups in Experiment 1 might have been a sort of self-priming, where an overall greater tendency to describe accidental events agentively in inner speech led to an overall better memory for accidental agents. Of course, it's at least equally plausible that a greater tendency to describe specific events agentively in inner speech led to better memory for those particular agents. (And it remains possible that Stanford students are simply more interested in liability for accidents, or better at remembering it, other things equal, than the other groups of students.)
One interesting point: if we compare across experiments, the effect of priming seems to be to interfere with agent memory for the subjects in the nonagentive priming group, rather than to improve agent memory in the agentive priming group. Thus in Experiment 1, the overall score of the Stanford subjects was 81.4%, whereas the subjects in Experiment 2 (presumably also Stanford undergrads in psychology courses) scored 78.1% with agentive priming, and 71.5% with nonagentive priming.
Anyhow, one sample of Spanish-speakers (from Chile) and one sample of English-Spanish bilinguals (from California) showed a small "selective deficit" in memory for the agents of accidental events. One sample of English speakers (Stanford undergrads, from all over the U.S.) did not show this deficit. However, priming by 24 non-agentive sentences was enough to create an even larger memory deficit (apparently for both intentional and accidental agents) among a similar sample of Stanford undergrads.
What can we conclude from this?
Certainly we should reinforce our prior belief that a small amount of short-term priming creates powerful (if presumably temporary) effects — exposure to 24 sentences, in this case, was enough to generate an effect roughly twice as large as the difference between being a monolingual English-speaking undergraduate at Stanford and being a monolingual Spanish-speaking undergraduate at the Universidad de Chile.
We also should certainly reinforce our prior belief that, as Lane Greene aptly put it, "language nudges thought (in certain circumstances)". Even modest statistical differences in the way that different language communities tend to express things may correlate with modest differences in the way that their members remember things, if the experimental circumstances are carefully calibrated to produce memory performance in a range that allows these effects to be measured.
But we should certainly not, in my opinion, conclude that there is "a different sense of blame in … Spanish".
In fairness to Lera Boroditsky, that sub-heading was presumably supplied by an editor at the WSJ. She phrases things this way:
In addition to space and time, languages also shape how we understand causality. For example, English likes to describe events in terms of agents doing things. English speakers tend to say things like "John broke the vase" even for accidents. Speakers of Spanish or Japanese would be more likely to say "the vase broke itself."
How accurate is this? It depends on what you think "more likely" means here. Specifically, in the study we've been discussing, the Stanford students described accidental events agentively 79.2% of the time, whereas for the Chilean students, it was 62.5%. So those particular Spanish speakers were still more likely to describe the accidental events agentively than non-agentively, though it's true that they were more likely to describe those events non-agentively than the Stanford students were. Whether this is a general fact about English and Spanish, or a more specific difference between those two student groups, remains to be determined.
Boroditsky goes on to suggest that
Such differences between languages have profound consequences for how their speakers understand events, construct notions of causality and agency, what they remember as eyewitnesses and how much they blame and punish others.
In my opinion, the paper that we've been discussing does not go very far towards justifying this claim. On the contrary, the observed differences (in event-understanding, notions of causality and agency, and eyewitness memory) were small rather than profound, and may be due to differences in culture as well as (or perhaps more than) than differences in language. As for differences in "how much they blame and punish others", that didn't come up — but Boroditsky's evidence for this is from another monolingual priming study (I think it's Fausey & Boroditsky, "Subtle linguistic cues influence perceived blame and financial liability", Psychonomic Bulletin & Review, 2010):
English speakers watched the video of Janet Jackson's infamous "wardrobe malfunction" (a wonderful nonagentive coinage introduced into the English language by Justin Timberlake), accompanied by one of two written reports. The reports were identical except in the last sentence where one used the agentive phrase "ripped the costume" while the other said "the costume ripped." Even though everyone watched the same video and witnessed the ripping with their own eyes, language mattered. Not only did people who read "ripped the costume" blame Justin Timberlake more, they also levied a whopping 53% more in fines.
As I suggested in my earlier post, if you're interested in these questions, you really should read the original papers where the experiments are documented. The body of research done by Boroditsky and her collaborators is extensive, careful, and interesting, and reprints are available on her excellent web site. But when you read these papers, as I suggested at the beginning of this post, you should ignore the conclusions until you're confident that you understand the facts.