Suppose you heard about a study "showing" that Ivy League students are more socially sensitive than students at public universities or students at private colleges not among the Ancient Eight. You'd be skeptical, I hope.
So you take a look at the study, and discover that the authors — themselves Ivy League grads — did five experiments.
In the first experiment, they chose three Harvard students who exemplify, in their opinion, the best characteristics of that fine institution, and three students from the University of Michigan, again selected to represent the authors' idea of what such students should be like. They then subjected these six students to a battery of tests of empathy and social intelligence, and found that the three Harvard students scored a bit better than the three Michigan students.
The other four experiments were similar. In the second experiment, the authors selected three Princeton students from among a few dozen student-government leaders, and compared them to three selected representatives of the University of Oregon football team, and three (in their opinion characteristic) young people who did not attend college at all. Experiment 3 tested six new students, three from Yale and three from the University of Arizona, again selected to represent the authors' opinion of what such students should be like. Experiment 4 re-used four of the students from Experiment 3, but substituted two new choices from the same pools. And Experiment 5 re-used five of the six students from Experiment 4, substituting for one participant who seemed on reflection not to be quite of the Right Kind.
At this point, you should be saying to yourself, Wait a minute, this is a total crock! Where was it published, in one of those fake take-the-money-and-run open-access journals?
No, the study on which I've based this description was published a few days ago in Science, the flagship journal of the American Association for the Advancement of Science. But I've disguised my description of the conclusions and procedures, to protect the guilty get you to engage your critical faculties. The paper compares "literary fiction" to "popular fiction" and non-fiction, not Ivy League students to students at public universities and less prestigious private colleges; and it compares short-term priming effects on readers, not the abilities of group members; but it does base its conclusions on experiments that compared three hand-selected examplars of each general category. This is a design feature that would never be accepted in a competently-taught undergraduate course.
We're talking about David Comer Kidd and Emanuele Castano, "Reading Literary Fiction Improves Theory of Mind", Science 10/3/2013:
Understanding others’ mental states is a crucial skill that enables the complex social relationships that characterize human societies. Yet little research has investigated what fosters this skill, which is known as Theory of Mind (ToM), in adults. We present five experiments showing that reading literary fiction led to better performance on tests of affective ToM (experiments 1 to 5) and cognitive ToM (experiments 4 and 5) compared with reading nonfiction (experiments 1), popular fiction (experiments 2 to 5), or nothing at all (experiments 2 and 5). Specifically, these results show that reading literary fiction temporarily enhances ToM. More broadly, they suggest that ToM may be influenced by engagement with works of art.
Needless to say, this study has gotten considerable media uptake. But what's the basis of the authors' conclusion that "literary fiction, which we consider to be both writerly and polyphonic, uniquely engages the psychological processes needed to gain access to characters’ subjective experiences"?
Here's their account of the selection of materials (from the Supplementary Materials).
Six texts (3 fiction, 3 nonfiction) were selected by the authors. Critical to the selection were the criteria that the works of fiction depicted at least two characters and the nonfiction primarily focused on a nonhuman subject. These criteria were used to focus on the effects of reading about individuals presented in literature compared to those of simply reading a well-written text. Two of the texts in the literary fiction condition, “The Runner” by Don DeLillo (38) and “Blind Date” by Lydia Davis (39), were written by contemporary award-winning authors. The third, “Chameleon”, was written by Anton Chekhov (40), an early master of the modern short story. The nonfiction texts were “How the Potato Changed the World” by Charles C. Mann (41), “Bamboo Steps Up” by Cathie Gandel (42), and “The Story of the Most Common Bird in the World” by Rob Dunn (43). Participants in each condition were randomly assigned to read one of the three appropriate texts.
Excerpts of the first several pages (8-11) of recently published novels were used as stimuli, with the stipulation that excerpts did not end in the middle of a scene or paragraph. In the literary fiction condition, participants read an excerpt from one of three recent finalists for the National Book Award for fiction [The Round House by Louise Erdrich (45), The Tiger’s Wife by Téa Obreht (46), and Salvage the Bones by Jesmyn Ward (47)]. In the popular fiction condition, participants read an excerpt from one of three recent bestsellers on Amazon [Gone Girl by Gillian Flynn (48), The Sins of the Mother by Danielle Steel (49), and Cross Roads by W. Paul Young (50)]. Participants in the control condition read no text.
Six new texts, 3 in each condition, were used. The stories in the popular fiction condition were selected from an edited anthology of popular fiction (29). They were also chosen to represent a range of genres, including science fiction [Space Jockey by Robert Heinlein], mystery [Too Many Have Lived by Dashiell Hammett] and romance [Lalla by Rosamunde Pilcher]. Stories in the literary fiction condition were selected from a collection of the 2012 winners of the PEN/O. Henry Award for short literary fiction (30). They included Corrie by Alice Munro, Leak by Sam Ruddick, and Nothing Living Lives Alone by Wendell Berry.
Four of the texts used in Experiment 4 were the same as those used in Experiment 3. Two new texts, Jane by Mary Roberts Rinehart (29, popular fiction) and Uncle Rock by Dagoberto Gilb (30, literary fiction), replaced Lalla (29) and Leak (30) from Experiment 3.
Five of the texts used in Experiment 5 were the same as those used in Experiment 4, but The Vandercook by Alice Mattinson (30) replaced Nothing Alive Lives Alone by Wendell Berry (30) in the literary fiction condition because it was shorter and so closer in length to the other texts.
It would be inappropriate to conclude anything about Harvard students vs. Michigan students based on tests of three representatives of each set, hand-picked by researchers who admittedly wanted to find a way to support their pre-existing belief that Harvard students are superior. And it's just as inappropriate to conclude anything about literary fiction vs. popular fiction, or literary fiction vs. nonfiction, based on a comparison of a very small number of short excerpts, selected by the researchers to be somehow typical or characteristic of the genre — especially given that the researchers chose the samples in an attempt to get exactly the results that they got.
By the way, despite the authors' care in selecting the rabbits to place into their hat, the resulting effects are rather small:
And the subjects' performance, least in some cases, is overall rather poor — compare these DANVA Norms:
Aside from the breathtaking overgeneralization (to all literary fiction vs. all popular fiction or all non-fiction, based on biased selection of a handful of probably atypical exemplars), there are other problems of interpretation.
Perhaps reading a short passage of self-consciously literary fiction reminded the Mechanical Turk test-takers of the experience of being in school, forced to read things they didn't understand and didn't much care for, and this put them in a mindset to attend more dutifully to a subsequent test whose goals also seemed arbitrary and mysterious.
Or maybe the Turkers who read non-fiction, popular fiction, or nothing were distracted by what they'd just read, or by intrusive thoughts from their daily life, whereas those who read passages from literary fiction were lulled by boredom into a state of mental blankness in which their responses to tests like RMET and DANVA were a bit more stimulus-driven.
These theories are not very likely, in my opinion, but I feel that they're just about as well supported by the reported experiments as the authors' conclusions are.
The real question here is why Science chose to publish a study with such obvious methodological flaws. And the answer, alas, is that Science is very good at guessing which papers are going to get lots of press; and that, along with concern for their advertising revenues from purveyors of biomedical research equipment and supplies, seems empirically to be the main motivation behind their editorial decisions.