In other non-replication news lately: There's been a pretty kerfuffle this month in social psychology and science blogging corners over a recent failure to replicate a classic 1996 study of automatic priming by John Bargh, Mark Chen, and Lara Burrows. The non-replication drew the attention of science writer Ed Yong who blogged about it over at Discover, and naturally, of John Bargh, who elected to write a detailed and distinctly piqued rebuttal at Psychology Today.
The original paper reported three experiments; the one that's the target of controversy used a task in which subjects unscramble lists of words and isolate one word in the list that doesn't fit into the resulting sentence. The Bargh et al. study showed that when the experimental materials contained words that were associated with stereotypes of the elderly (e.g. Florida, bingo, gray, cautious), subjects walked more slowly down the hall upon leaving the lab compared to subjects who saw only neutral words. The result has been energetically cited, and has played no small role in spawning a swarm of experiments documenting various ways in which behavior can be impacted by situational or subliminal primes. The authors explained their findings by suggesting that when the concept of a social stereotype is activated (e.g. via word primes), this can prompt behaviors that are associated with that stereotype (e.g. slow walking).
But allegedly, despite scads of studies that have built on some of Bargh et al.'s conclusions, the slow-walker study has yet to be fully replicated, which motivated Stéphane Doyen and colleagues at the Université Libre de Bruxelles to undertake the job, reporting their attempts in a recently-published article in PLoS ONE. Their first experiment, which contained sober experimental precautions such as using automated timing systems and ensuring that experimenters were blind as to which conditions subjects were assigned to, failed to produce a priming effect. This led them to wonder in print whether the original priming results could have come from a failure to strictly implement double-blind experimental methods, which serves as the motivation for their second experiment.
The second study reported by Doyen et al. focused on whether an effect could be induced by specifically manipulating the experimenters' expectations of how the subjects would behave. Ten different experimenters were included; these experimenters were made aware of which of their subjects were assigned to the word prime condition, and which were assigned to the neutral word condition. However, half of them were led to expect that when primed, their subjects would walk more slowly as a result of the experimental manipulation, and half of them were led to believe they would walk more quickly. (In reality, all subjects received the same elderly-priming materials as in the first experiment). The paper doesn't go into detail as to how these experimenter expectations were established, other than to report that all this took place during "a one hour briefing and persuasion session prior to the first participant's session." In addition, the experimenters had their expectations reinforced by the behavior of their very first study subject, who was a confederate in cahoots with the researchers and obligingly walked quickly or slowly, as expected.
Not surprisingly, when subjects' walking speed was measured by the experimenters themselves on a stopwatch, their pace aligned with expectations: subjects in the word prime condition were timed at faster speeds than those in the neutral word condition when the experimenters expected that priming would speed them up, and conversely, when experimenters expected priming to slow the subjects down, they timed them at slower speeds in the word prime condition relative to the neutral word condition. This wasn't the whole story, though—the subjects' actual speed was also timed by an automated motion-sensitive system. Objective measures of walking speed showed that when the experimenters expected priming to accelerate their subjects, that's exactly what happened. But when they expected subjects to slow down as a result of the priming, there was no difference between the primed subjects and those in the neutral word condition.
This tells us that the actual walking speed of subjects isn't determined entirely by experimenters' expectations; if that were the case, subjects should have walked more quickly when expected to do so as a result of priming. But it does suggest that the priming effect can be either boosted or dampened by experimenter expectations-presumably because the experimenter is emitting subtle and possibly inadvertent cues that impact the subjects' behavior (it would have been interesting, for example, to measure the experimenters' speech rate).
The authors' take on all this is to conclude that:
although automatic behavioral priming seems well established in the social cognition literature, it seems important to consider its limitations. In line with our result it seems that these methods need to be taken as an object of research per se before using it can be considered as an established phenomenon.
I'm really not sure what the above statement actually means. But it certainly invites a first-blush response of the Ohmygosh-is-all-this-stuff-we-thought-we-knew-about-unconscious behavioral-priming-wrong? variety. But it's worth waiting for that first flush to settle. Because in the end, the result in and of itself causes little trauma to the original Bargh et al. interpretation of their priming data, and none whatsoever to the more general issue of whether automatic behavioral priming exists.
First of all, the fact that experimenter expectations led to an effect on subjects' behavior doesn't mean that this accounts for the original Bargh et al. results. It just means that it has a measurable impact on any priming effects that may or may not occur. To find otherwise would be rather surprising, especially given the rather heavy-handed way in which these expectations seem to have been induced. (Bargh has countered the paper by claiming that in fact, their own study did implement double-blind methods; whether or not this was done rigorously enough, it certainly seems clear that the later Doyen et al. paper went to special lengths to create a salient experimenter bias above and beyond what would plausibly have existed in the earlier work).
So what we're really left with is the issue of how to interpret the non-replication. There are a number of possible reasons for this, some of them really boring, some of them mildly interesting, but most of them unrelated to the important theoretical questions. For example:
1. The non-replication itself is an experimental failure. In experiments involving humans, all "replications" are at best approximate. Other unforeseen aspects of the experimental design and implementation may have obscured a priming effect or led to unusually noisy data. For example, maybe the Belgian experimenter was attractive to the point of distraction. Maybe more of the undergraduate subjects were tested in the morning while still sluggish. Maybe the experimenter was flaky and inconsistent in implementing the study. Obviously, if an effect is repeatedly vulnerable to these kinds of obliterations, that can speak to the fragility of the effect; but the point is that for any single failure to replicate, we can't tell for sure what the source of the non-replication is. Perfectly robust results can be and often are drowned in noise inadvertently introduced somewhere in the experimental procedure. We can simply document that the failure to replicate occurred, while noting (and further testing) any obvious discrepancies from the original implementation.
2. The word primes may not have successfully triggered a stereotype for the elderly in the minds of the subjects, or the conceptual stereotype may not have had a strong association with slow walking movements. It's entirely conceivable that stereotypes would shift due to time or geographic location. A lot has happened demographically since 1991 when Bargh et al. first collected their data. Upon hearing about this study, for example, my own son remarked (referring to his alpine-skiing, Nepal-trekking grandmother): "Those subjects have obviously never met Nanny." In this case, there's no threat to Bargh's original theoretical contribution about the activation of social stereotypes as a driver of behavior; it's just that any given stereotype isn't going to be held by all populations.
3. There was nothing wrong with the stereotypes; the original result really was a statistical fluke, or an experimental artifact, or limited to a very narrow population or set of experimental circumstances. This eventuality is the most damaging to Bargh et al. But does it really threaten the more general conclusion that behavior can be unconsciously, or automatically primed? No; it simply casts doubt on the more specific interpretation of the results as being due to the activation of social stereotypes. In fact, it's hard to interpret Doyen et al.'s second study, which manipulated experimenter expectations, without appealing to unconscious behavioral priming (as fairly pointed out by Ed Yong in his post). Unless the experimenters actually violated experimental ethics outright by instructing the subjects to walk more slowly, it seems likely that the subjects were unconsciously picking up on experimenter cues (but which ones? Speech rate? Certain words?) unconsciously emitted by the experimenters. What's more, there are by now dozens and quite possibly hundreds of demonstrations of automatic priming effects using a variety of different experimental paradigms, some of which do apply the activation of stereotypes. (Some examples here and here.) Given that it's now 2012, not 1996 when the Bargh et al. paper first appeared, any non-replication of that original result is going to have to be interpreted within the context of that entire body of work.
So. Hardly material to launch a full-scale kerfuffle. This is just science plodding its plodding way towards its plodding approximation of truth. Enough with the rubbernecking already—there are no bloody conclusions to be found here, at least not yet.
So why am I bothering to add my voice to the fray? Because I think that it's very important that we actually talk about replication, what it means and doesn't mean, and that we do so in a way that moves beyond thinking about it as a cagematch between scientists.
When I talk to non-scientists, I'm distressed by a general illiteracy in the understanding of non-replication. All too often, failures to replicate are treated as abrupt reversals of truth. As if any new result, especially a startling or counterintuitive one, were anything but an opening gambit, not a declaration of truth. New studies, whether they replicate the result or not, are simply the next moves that change the way the board is now configured. But all too often, a failure to replicate is portrayed as an instance of science "changing its mind" or an indictment of the scientific method, when really, it's at the heart of the scientific method. When it comes down to it, the sound of non-replication isn't the sound of the puck being slapped into the opponent's net. It's the sound of a muttered "hmmm, what's going on here," the sound of science rolling up its sleeves with a sigh and settling in for a long night's work.