More details on the Marc Hauser case

« previous post | next post »

Tom Bartlett, "Document Sheds Light on Investigation at Harvard", Chronicle of Higher Education 8/19/2010:

Ever since word got out that a prominent Harvard University researcher was on leave after an investigation into academic wrongdoing, a key question has remained unanswered: What, exactly, did he do? […]

An internal document, however, sheds light on what was going on in Mr. Hauser's lab. It tells the story of how research assistants became convinced that the professor was reporting bogus data and how he aggressively pushed back against those who questioned his findings or asked for verification.

A copy of the document was provided to The Chronicle by a former research assistant in the lab who has since left psychology. The document is the statement he gave to Harvard investigators in 2007.

Bartlett's anonymous source paints an alarming picture of practices and relationships in Hauser's lab:

According to the document that was provided to The Chronicle, the experiment in question was coded by Mr. Hauser and a research assistant in his laboratory. A second research assistant was asked by Mr. Hauser to analyze the results. When the second research assistant analyzed the first research assistant's codes, he found that the monkeys didn't seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In other words, the experiment was a bust.

But Mr. Hauser's coding showed something else entirely: He found that the monkeys did notice the change in pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment was a big success. […]

The research assistant who analyzed the data and the graduate student decided to review the tapes themselves, without Mr. Hauser's permission, the document says. They each coded the results independently. Their findings concurred with the conclusion that the experiment had failed: The monkeys didn't appear to react to the change in patterns.

They then reviewed Mr. Hauser's coding and, according to the research assistant's statement, discovered that what he had written down bore little relation to what they had actually observed on the videotapes. He would, for instance, mark that a monkey had turned its head when the monkey didn't so much as flinch. It wasn't simply a case of differing interpretations, they believed: His data were just completely wrong.

As word of the problem with the experiment spread, several other lab members revealed they had had similar run-ins with Mr. Hauser, the former research assistant says. This wasn't the first time something like this had happened. There was, several researchers in the lab believed, a pattern in which Mr. Hauser reported false data and then insisted that it be used.

Let me say again: In addition to the obvious "best practices" of blind coding and careful calibration of inter-coder agreement, there's no longer any excuse not to publish the raw data from experiments like these.

Guarding against fraud or the suspicion of fraud is only one of the many reasons that this is a good idea.

[Update — more coverage: Greg Miller, "Investigation Leaves Field in the Dark About a Colleague's Work", Science 20 August 2010; Derek Bickerton, "Why Hauser Did It: Scientific dogma, not Hauser, is to blame for misconduct", Psychology Today 8/19/2010; "Harvard Probes Claims of Scientific Misconduct", NPR 8/18/2010.]



24 Comments

  1. Twitter Trackbacks for Language Log » More details on the Marc Hauser case [upenn.edu] on Topsy.com said,

    August 19, 2010 @ 12:21 pm

    […] Language Log » More details on the Marc Hauser case languagelog.ldc.upenn.edu/nll/?p=2565 – view page – cached August 19, 2010 @ 11:41 am · Filed by Mark Liberman under The academic scene Tweets about this link […]

  2. john riemann soong said,

    August 19, 2010 @ 12:32 pm

    Where should the raw video (often gigabytes upon terabytes of data) be published? Youtube?

    (Although I guess with cell biology high resolution images are important)

    [(myl) There are many possibilities: University archives and journals are two of the obvious choices. And "published" doesn't necessarily mean "available on line". It might mean "send us $100 and we'll send you the data".

    Of course, what I mean by "raw data" also includes the coding, whether subjective or automatically calculated.

    Note that the amount of data is not in fact enormous. Consider the procedure described here:

    Briefly, the tamarin colony was pseudorandomly divided into two groups, one per grammar. Each group included a mixture of sexes and ages (all adult). All of the monkeys in a particular group were simultaneously exposed in their home cages to 20 min of repeated playback of 60 different grammar-consistent strings, in random order, during the evening. They were then tested individually the next morning in a sound chamber. Testing started with a re-familiarization phase, when random stimuli from the previous evening's session were again played back for 2 min while the animal was fed treats (at a rate determined by the animal's feeding, and uncorrelated with stimulus presentation). We then closed the sound chamber door, started video monitoring and recording, and began playback of the test stimuli. No food was delivered during testing. Playback was initiated by the observer when the animal was looking down and away from the loudspeaker, and latency and duration of looking (orientation towards the loudspeaker Fig. 1B) were later scored blind to condition from the digitized video (>90% reliability). Each animal (regardless of the grammar on which they were trained) was tested with the same eight stimuli in random order. Four were novel stimuli consistent with the training grammar, whereas the other four were violations (but consistent with the other grammar).

    The details given here specify 13 tamarins in the colony — round it up to 15 — and a mean interstimulus interval of 23.1 seconds in the test phase — round it up to 30. Given 8 test stimuli per animal, that's 15*8*30 seconds of video data to be coded, or a total of exactly one hour of video for the whole experiment. Depending on the coding used, this would be something like 1 to 20 GB (though in fact the quality of video used in such experiments seems to be below the low end of that spectrum of rates), and would fit on a single optical drive for physical transfer. This is the same order of magnitude, after all, as the amount of data Netflix delivers to a single customer in a single transaction.

    It's also worth mentioning that a small amount of additional instrumentation would allow the motion and orientation of the test animal to be easily calculated automatically — in fact, current video analysis software is probably good enough to do this even without sticking extra sensors or other devices on the animals. Here's a picture of what the video frames look like:


    ]

  3. Ellie said,

    August 19, 2010 @ 12:36 pm

    I was once a research assistant under a man who led the education program at a research institute, and my task was to rebuild a database of former students using raw data, and see if their participation in a research assistantship program had led to "success" in their future careers. When the data showed that the only students who had gone on to complete their PhD were white students from Tier I undergraduate schools, my boss pulled out photos from years gone by and assigned "blackness" to two students who were just "very tan" (they had self-identified as Caucasian and Hispanic on their program applications). "He looks black enough to me," was the best direct quote. I wish I was joking about this.

    When I left that internship I told the institute director what I had witnessed (among many other examples of very poor behavior). A decade out, he is still head of education at the same institute, I am working as a secretary while trying to somehow find my way back into academia.

    It is to the detriment of the the research community that many bright, motivated young students and researchers may be weeded out due to having a bit too much integrity.

  4. Mr Punch said,

    August 19, 2010 @ 12:37 pm

    In this case the divergent raw coding, without the actual videos, woul have been sufficient to indicate that there was a problem.

  5. Lance said,

    August 19, 2010 @ 1:42 pm

    But divergent raw coding isn't raw data. If Hauser had published his coding and the second research assistant's coding, you wouldn't be able to tell which was accurate, only that two different people had looked at the tapes and made different conclusions. "Raw data" means "here, look at the actual result of the experiment and decide for yourself".

    [(myl) Not exactly. To me, "raw data" in a case like this would mean both the basic recordings and also subjective annotation, linked to the recordings via time-marks and to the (typically anonymized) coders via metadata.

    This sort of publication has been routine in several other fields (speech and language technology; child language acquisition; geophysics) for decades.]

  6. Lance said,

    August 19, 2010 @ 2:17 pm

    Mark,

    Sorry, yes. I didn't mean "raw data doesn't include the coding/annotation". I was replying specifically to the comment above mine, in which Mr Punch claimed that the raw coding without the videos would have been sufficient to see that there were no actual results behind the paper. In this case, as far as I understand it, the raw coding wouldn't have been enough. (At the same time, though, you'd certainly also want to see the coding, so that reviewing the paper you can do exactly what the Second Research Assistant did: draw your own rough conclusions about how the coding should go, and look at Hauser's coding to see how accurate you think it is.)

  7. bianca steele said,

    August 19, 2010 @ 4:30 pm

    Somehow the automation/computerization of the video analysis feels like cheating, though I personally find it interesting as a a programming challenge. While not the primary purpose of these experiments, surely, one aspect of their importance is in training students to think like scientists (in effect, the machine would seem to be doing their thinking for them).

    [(myl) Interesting. Do you also think that measuring reaction times with a computer, rather than a stopwatch, is cheating?]

  8. dwmacg said,

    August 19, 2010 @ 4:43 pm

    I suppose one silver lining in this cloud is that, if the anonymous student's story is accurate, the students were thinking like scientists. Somewhere they got the right training.

  9. anonymoose said,

    August 19, 2010 @ 5:22 pm

    @Bianca. On the contrary I would think of not automating as feeling like cheating. With automatic methods you can ensure that analysis can be replicated. However, doing things by hand or by eye, it's hard to replicate bored grad students squinting at a monitor. I think follow what you mean though, needing to earn your chops by being knee-deep in data.

  10. bianca steele said,

    August 19, 2010 @ 5:58 pm

    myl:
    On the one hand, testing using a stopwatch seems very trivial and actually a pretty good candidate for computerization. On the other hand, it seems like it would be a useful skill for an experimenter to have, and so would be learning how to manipulate the data to account for experimenter effects. Comparing two versions of an experiment–a modern one in which a computer presents a stimulus and a computer user responds using an input device (with the computer recording the time of the response), and a traditional one in which an experimenter presents a stimulus visible to both parties, observes when the subject triggers some kind of mechanical "input device," and writes the time down on a prepared worksheet–I don't see any result from the traditional version that you wouldn't get from the modern one. In fact, the modern one would automatically be more accurate. Maybe there would be subjective information conveyed during the more personal version of the experiment, which could conceivably end up in the final results.

  11. Doug said,

    August 19, 2010 @ 6:09 pm

    This is peripheral, but I find it odd that he's referred to as 'Mr. Hauser' rather than 'Dr' or 'Prof.'

    [(myl) That's indeed rather odd, especially for the Chronicle of Higher Education.]

  12. Will said,

    August 19, 2010 @ 6:31 pm

    @Doug, I didn't even notice that. I wonder if that was just carelessness or indifference on part of the writer, or whether it was intended as an implicit rejection of those titles.

  13. Neal Goldfarb said,

    August 19, 2010 @ 11:20 pm

    Wow, I can't believe nobody noticed this.

    An internal document, however, sheds light on what was going on in Mr. Hauser's lab. It tells the story of how research assistants became convinced that the professor was reporting bogus data and how he aggressively pushed back against those who questioned his findings or asked for verification.

    "bogus data"

    "bogus data"

    "bogus"

    Someone alert Simon Singh.

  14. richard howland-bolton said,

    August 20, 2010 @ 7:14 am

    @Neal
    Well as long as you don't say anything to the BCA everything should be just fine.

  15. Sally Thomason said,

    August 20, 2010 @ 11:14 am

    @ Doug, about the address form "Mr. Hauser":

    I assumed that that was the usual Harvard-Yale insistence (at least that's what it was when I was a graduate student at Yale eons ago) on "Mr." — and, nowadays, presumably also "Ms." — instead of the "Prof." and/or "Dr." titles used elsewhere in American academia: the idea, I was told at the time, was that since of course all faculty members at Yale (and Harvard) had Ph.D.'s and held high professorial office, it would be tacky to underline the fact by using a professional title. In other words, reverse snobbery.

    But it is a bit of a surprise that the Chronicle of Higher Education would go along with it.

  16. Ben Hemmens said,

    August 20, 2010 @ 12:40 pm

    Congratulations to the junior lab staff: paid peanuts, but behaving better than the monkeys ;-)

  17. Diane said,

    August 20, 2010 @ 1:24 pm

    Wouldn't Hauser have known that both he and a research assistant had coded the experiment independently? Because if he did know that, then he should have also known that there would be a discrepancy between his false codes and his assistant's, and so he should have realized that this would lead to him being caught.

    …hmmm….been thinking…I suppose there are a lot of possible explanations:

    1) Maybe he really didn't know there were two independent codings of the data
    2) Maybe he had pressured the other coder to falsify the codes and thought that they had done so
    3) Maybe he had planned but forgotten to falsify the other coder's codes

    and most likely…

    4) He knew full well he would get caught but he figured his status as PI and stature in the field meant his students and staff would not be able to do anything about it.

    Wow, when I started writing this comment I was just mystified by his behavior and now I have led myself to the conclusion that he is not just a cheater but an arrogant SOB to boot.

  18. dwmacg said,

    August 20, 2010 @ 1:56 pm

    Bickerton: Hauser is a victim.

    I can sort of agree with his point about scientific dogma, but I don't see how that excuses Hauser's alleged actions.

  19. Blake Stacey said,

    August 20, 2010 @ 4:40 pm

    Science now has a news piece up, "Harvard Dean Confirms Misconduct in Hauser Investigation", which includes the text of a letter from Harvard's dean of the Arts and Sciences faculty. It begins as follows:

    No dean wants to see a member of the faculty found responsible for scientific misconduct, for such misconduct strikes at the core of our academic values. Thus, it is with great sadness that I confirm that Professor Marc Hauser was found solely responsible, after a thorough investigation by a faculty investigating committee, for eight instances of scientific misconduct under FAS [Faculty of Arts and Sciences] standards. The investigation was governed by our long-standing policies on professional conduct and shaped by the regulations of federal funding agencies. After careful review of the investigating committee's confidential report and opportunities for Professor Hauser to respond, I accepted the committee's findings and immediately moved to fulfill our obligations to the funding agencies and scientific community and to impose appropriate sanctions.

    One paper has been retracted, another amended with a correction, and a third is being bounced around among researchers and journal editors.

  20. Simon Spero said,

    August 20, 2010 @ 7:08 pm

    Mark (Liberman) is precisely right about how this situation shows exactly why it is so important both to preserve and make available to all, both the data (raw sensory values), as well as the metadata used to make inferences about the state of the world based on that data, before you can decide whether you have justification to treat that information as knowledge (or evidence, if you prefer).

    Michael Tomasello's comments in the Science article cited also show how important it is to automatically retain bidirectional links within and between publications and data-sets, so that the ripple effects of epistemic instability caused by the discovery of errors, or deceit can very quickly be estimated.The need for Truth Maintenance Systems is greater now than it's ever been.

    Terabyte scale data is easy to store for the long term; petascale data is becoming more common; exa- is the new peta- (which is handy for me, as, having an English accent, and being concerned with scaling issues relating to the large numbers of independent data sets, talk about the exa-file problem causes far less startled looks.

    Data management for Scientific data is about to be a lot more interesting to a lot more people now that the NSF will be requiring *two* pages of data management plans in all proposals (possibly starting as early as October). That's like thirty pages in real writing :-)

    Dealing with the issues involved properly requires collaboration across a huge range of disciplines, both in understanding what is possible from the technical side, what is desirable from the policy side, and what is useful and in harmony with existing social practices in each scientific domain.

    The "file drawer" is ready to be opened. Be sure to mount a scratch monkey.

    This message has been brought to you by Von Neurath Industries Deep Water Salvage.

  21. Jarek Weckwerth said,

    August 21, 2010 @ 5:44 am

    @dwmacg

    Bickerton: Hauser is a victim.

    I can sort of agree with his point about scientific dogma, but I don't see how that excuses Hauser's alleged actions.

    Hmmm… That Bickerton piece is somewhat disconcerting, I have to say. But it does not really relate to what Hauser may have done. Or perhaps — it seems to use the case as a pretext for criticising the whole school of thought represented by Hauser, among many others. Read the comments — some seem to point that out quite reasonably.

    And I think it shows that Dan Everett's comment on the first post in this series is a legitimate warning. These kinds of flops can have very serious consequences for science in general…

  22. TonyK said,

    August 21, 2010 @ 11:04 am

    Bickerton's piece is barely coherent. "Hauser fell victim to a soon-to-be-outdated view of evolution." WTF? I cry bullshit.

  23. Marc Hauser’s Trolley Problem | Savage Minds said,

    August 22, 2010 @ 1:13 am

    […] HIgher ed has published a leaked document from a former research assistant in Hauser's case, Language Log, John Hawks and NeuroAnthropology have all posted some links, greg laden has a hilarious post about […]

  24. dearieme said,

    August 22, 2010 @ 12:56 pm

    Bickerton should explain how an abstraction can be a moral agent while a human is free of all responsibility for his actions. No wonder the reputation of science is in decline. Though, of course, we can always "hide the decline".

RSS feed for comments on this post