More on the monkey business at Harvard

« previous post | next post »

Nicholas Wade, "In Harvard Lab Inquiry, a Raid and a 3-Year Wait", NYT 8/13/2010, gives some additional information about the Marc Hauser scandal.  The new information is still basically rumor — the result of interviews with sources both anonymous and not, with the non-anonymous information being largely second hand. But it all suggests that whatever happened is more serious than just a bit of careless record-keeping.

Key points include a "raid" by "Harvard authorities::

Marc Hauser’s academic career was soaring when suddenly, three years ago, Harvard authorities raided his laboratory and confiscated computers and records.

There are apparently "eight charges", though it's not clear yet what they are:

In January this year, a faculty committee at last completed its report, said to contain eight charges against Dr. Hauser. But the report was kept secret and nothing changed until this month when someone showed The Boston Globe a letter about the investigation from Dr. Hauser to his faculty colleagues.

There's some general confirmation of the notion that monkeys can be very difficult subjects for the sort of experiments he does with them, and that the relevant data collection is often at best very subjective:

The captive animals, a colony of some 40 cotton-topped tamarins, may have contributed to the difficulties in Dr. Hauser’s laboratory. It is difficult to get the tamarins to pay attention, especially after the monkeys get used to experimenters.

“With some of these methods it was never clear to me how one could obtain meaningful results,” said a person with experience in Dr. Hauser’s lab, who requested anonymity for fear of retaliation. “The monkeys were often either jumping around, or not moving at all, and you rarely got the sense of an unambiguous response.”

A typical experimental paradigm is habituation/discrimination, where the monkeys are exposed to stimuli of a certain sort for a while, and then later exposed to a stimuli of a different sort. The dependent variable is some measure of how much attention they pay to the novel stimuli.  This measure is subjectively coded, by direct observation or by inspection of video recordings. To be believable, the coding should be "blind" — that is, the coders should not know which experimental category they are coding.  In the versions of such procedures used for studying auditory pattern perception in infants, precautions are generally taken to ensure that those who are coding a subject's attention are unaware of the nature of the trial being coded — see e.g. Janet Werker et al., "The conditioned head turn procedure as a method for testing infant speech perception", Early Dev Parent 6:171-8 (1997) . Unfortunately, this is by no means the norm in all areas of experimental psychology; in fact I'd say that "blind" coding of behavior is the exception rather than the rule, though I don't have counts to show this.

As far as I can recall, Hauser's papers don't specify blind coding. [Update — I was wrong, and should have checked first — e.g. Fitch and Hauser "Computational Constraints on Syntactic Processing in a Nonhuman Primate", Science 2004 says that the tamarins' "latency and duration of looking … were later scored blind to condition from the digitized video (>90% reliability)".]

If the  behavior to be coded is exceptionally hard to interpret, or even basically uninterpretable in some cases, then obviously this opens the door to a whole spectrum of problems, from the sort of "honest cheating" that tends to happen whenever subjective judgments are made by people who have beliefs about how the results should come out, to out-and-out fraud.

[If the behavior really was scored blind with >90% inter-annotator agreement, then there should be little or no reason for concern on this score. But then the quote in Wade's article (“With some of these methods it was never clear to me how one could obtain meaningful results. […] The monkeys were often either jumping around, or not moving at all, and you rarely got the sense of an unambiguous response.”) would be completely irrelevant. It's necessary to repeat again that we don't have very much evidence here, but the only concrete testimony that we do have — the 1995 mirror experiment and the withdrawn 2002 paper — says that at least the documentation of controlled scored of behavior was inadequate, and that in some cases the scoring could not be replicated from the video, or the video itself was missing.]

This is one of the reasons why (in my opinion) raw data should always be published.

Wade's article also gives confirmation of some (at best) careless record-keeping:

Other experimental problems have come to light with three articles investigated by the Harvard committee. In two, the supporting data did not exist. Dr. Hauser and a colleague repeated the experiments, and say they got the same results as published. In a third case, Dr. Hauser retracted an article published in the journal Cognition in 2002 but gave the editor no explanation of his reason for doing so.

And also confirmation (or at least presentation in print) of rumors that I earlier chose not to repeat, about the role of grad students and other junior lab members in bringing the problems to light:

Whatever the problems in Dr. Hauser’s lab, they eventually led to an insurrection among his staff, said Michael Tomasello, a psychologist who is co-director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, and shares Dr. Hauser’s interest in cognition and language.

“Three years ago,” Dr. Tomasello said, “when Marc was in Australia, the university came in and seized his hard drives and videos because some students in his lab said, ‘Enough is enough.’ They said this was a pattern and they had specific evidence.”

My own feeling, for what it's worth, is that spectacular over-interpretation in the service of ambition ought to be as much a cause for concern as tendentious subjective data coding and poor record-keeping.  Putting it cynically, if you invent exciting explanations that extrapolate way beyond relatively pedestrian data, and you still get your papers published in Science and Nature, then maybe it doesn't matter so much whether your results are reproducible or even whether the data ever actually existed.

On the other side, Elyssa A. L. Spitzer, "Harvard Keeps Mum as Scientists Call for Transparency in Probe", Harvard Crimson 8/14/2010, quotes Steve Pinker in a very supportive vein:

“Marc is a beloved scientist, teacher, and colleague,” Hauser’s friend and colleague, psychology professor Steven A. Pinker wrote in an e-mail Thursday. Pinker has not collaborated with Hauser on any published studies. “He is widely admired not just for his astonishing breadth and creativity in devising ways to investigate deep problems with elegant experiments, but for his warmth, humor, and lack of pretension.”

Hauser did not respond to an interview request, but Pinker said his colleague will continue to head his lab next year while on leave.


  1. John Lawler said,

    August 14, 2010 @ 12:09 pm

    Another NYT article, in the Education section, refers to the "ripple effect" that this brouhaha may have on other fields. Linguistics isn't mentioned, but Robert Seyfarth, a Penn researcher in animal behavior, is quoted.

    [(myl) Yes, Robert Seyfarth and Dorothy Cheney, both now at Penn, were Hauser's grad advisors. Robert is quoted in the most recent article as well:

    Dr. Hauser, 50, was trained by two researchers renowned for the rigor of their field work on animal behavior, Robert Seyfarth and Dorothy Cheney of the University of Pennsylvania. “Marc was our first graduate student,” Dr. Seyfarth said. “But many years ago, we decided that Marc’s way of doing things and ours were not really the same. We just differed about our approach to research.”

    One reason, Dr. Seyfarth said, was that he and Dr. Cheney studied animals in natural conditions, where the pace of data collection is much slower, whereas Dr. Hauser had moved into studying captive animals.

    It's worth noting that Robert and Dorothy are not only "renowned for the rigor of their field work", but over the past decade, they have begun publishing their raw data, e.g. here, and to encourage their students and colleagues to do so as well.]

  2. John said,

    August 14, 2010 @ 12:45 pm

    "a different sort of stimulus", no?

    [(myl) Not really. In the habituation/discrimination paradigms I'm familiar with, the habituation phase involves a fairly long period during which a sequence of stimuli drawn from some distribution is played. Then during the discrimination phase, a sequence of stimuli drawn from some other distribution is played, and some measure of attention (such as the proportion or duration of gaze oriented to some specified target) is coded.

    If "a different sort of stimuli" bothers you for some reason, substitute "stimuli of a different sort".]

  3. Margaret L said,

    August 14, 2010 @ 1:44 pm

    I'm a little surprised at the statement that "this [blind coding] is by no means the norm in psychology." I'm not in developmental, but my impression is that blind coding is de rigeur in that crowd. I assumed that this kind of rigour had been imported into comparative psychology by the people who do cross-over human-dev/primate work. Maybe I've been naive.

  4. Taylor Selseth said,

    August 14, 2010 @ 4:17 pm

    Yikes! I have his book "Moral Minds", this really hurts my opinion of him as a scientist.

  5. Anon Shmanon said,

    August 14, 2010 @ 9:12 pm

    The reason for the missing records is not explained in the Proceedings of the Royal Society B. Similarly, there is no explanation for why “data do not support the reported findings’’ in the Cognition paper that is being retracted. But Harvard’s policy describing its procedures for responding to allegations of misconduct in research notes that “research misconduct does not include honest error or differences of opinion.’’

    This seems to rule out most of the benign explanations for the retractions.

  6. baby researcher said,

    August 14, 2010 @ 11:40 pm


    Blind coding is the norm, not the exception, in research using the habituation/dishabituation method.

    [(myl) This is true, at least of the CHT method as described by Janet Werker et al., "The conditioned head turn procedure as a method for testing infant speech perception", Early Dev Parent 6:171-8 (1997). I've changed the body of the post to reflect this fact.]

  7. Other Anon Dude said,

    August 15, 2010 @ 12:19 am

    Is Tomasello just a real straight-shooter, or is there animosity here? It's striking that such a significant peer would make that level of comment at this stage.

  8. A reader said,

    August 19, 2010 @ 11:22 am

    More on this stuff has just come to light:

  9. Sivi said,

    August 19, 2010 @ 11:47 am

    Based on more recent reports about how things were going with his students, it sounds like one of those things where people knew something sketchy might be going on, but either didn't have evidence or were not in a position to do anything about it.

RSS feed for comments on this post