There's new information emerging from the slow-motion Marc Hauser train wreck. Carolyn Johnson, "Journal editor questions Harvard researcher's data", Boston Globe 8/27/2010:
The editor of a scientific journal said today the only "plausible" conclusion he can draw, on the basis of access he has been given to an investigation of prominent Harvard psychology professor Marc Hauser's research, is that data were fabricated.
Gerry Altmann, the editor of the journal Cognition, which is retracting a 2002 article in which Hauser is the lead author, said that he had been given access to information from an internal Harvard investigation related to that paper. That investigation found that the paper reported data that was not present in the videotape record that researchers make of the experiment.
“The paper reports data … but there was no such data existing on the videotape. These data are depicted in the paper in a graph,” Altmann said. “The graph is effectively a fiction and the statistic that is supplied in the main text is effectively a fiction.”
Gerry Altmann posted a statement on his weblog with a more detailed account: harvard misconduct: setting the record straight", 8/27/2010). As indicated in Johnson's article, the facts and interpretations that Altmann provides go beyond, to a shocking degree, previously described issues of lost data or disagreement about subjective coding of animal behavior.
As Editor of the journal Cognition, I was given access to the results of the investigation into the allegations of misconduct against Marc Hauser as they pertained to the paper published in Cognition in 2002 which has now been retracted. My understanding from those results is the following: the monkeys were trained on what we might call two different grammars (i.e. underlying patterns of sequences of syllables). One group of monkeys were trained on Grammar A, and another group on Grammar B. At test, they were given, according to the published paper, one sequence from Grammar A, and another sequence from Grammar B - so for each monkey, one sequence was drawn from the "same" grammar as it had been trained on, and the other sequence was drawn from the "different" grammar. The critical test was whether their response to the "different" sequence was different to their response to the "same" sequence (this would then allow the conclusion, as reported in the paper, that the monkeys were able to discriminate between the two underlying grammars). On investigation of the original videotapes, it was found that the monkeys had only been tested on sequences from the "different" grammar - that is, the different underlying grammatical patterns to those they had been trained on. There was no evidence they had been tested on sequences from the "same" grammar (that is, with the same underlying grammatical patterns). […]
Given that there is no evidence that the data, as reported, were in fact collected (it is not plausible to suppose, for example, that each of the two test trials were recorded onto different videotapes, or that somehow all the videotapes from the same condition were lost or mislaid), and given that the reported data were subjected to statistical analyses to show how they supported the paper's conclusions, I am forced to conclude that there was most likely an intention here, using data that appear to have been fabricated, to deceive the field into believing something for which there was in fact no evidence at all. This is, to my mind, the worst form of academic misconduct. However, this is just conjecture; I note that the investigation found no explanation for the discrepancy between what was found on the videotapes and what was reported in the paper. Perhaps, therefore, the data were not fabricated, and there is some hitherto undiscovered or undisclosed explanation. But I do assume that if the investigation had uncovered a more plausible alternative explanation (and I know that the investigation was rigorous to the extreme), it would not have found Hauser guilty of scientific misconduct.
As a further bit of background, it’s probably worth knowing that according to the various definitions of misconduct, simply losing your data does not constitute misconduct. Losing your data just constitutes stupidity.
The Globe article and Altmann's blog post have been quickly picked up elsewhere, e.g. Greg Miller, "Journal Editor Says He Believes Retracted Hauser Paper Contains Fabricated Data", Science 8/27/2010:
Evidence of bad behavior by Harvard University cognitive scientist Marc Hauser continues to mount. Today Gerry Altmann, the editor of the journal Cognition, posted a statement on his blog saying that his review of information provided to him by Harvard has convinced him that fabrication is the most plausible explanation for data in a 2002 Cognition paper. The journal had already planned to retract the paper.
And Heidi Ledford, "Cognition editor says Hauser may have fabricated data", Nature 8/27/2010:
The Boston Globe today reports that Gerry Altmann, editor-in-chief of the journal, says that he has seen some of the findings of Harvard University’s internal misconduct investigation of its famed psychologist. According to that investigation, there was no record of some of the data reported in the paper as a graph.
“The graph is effectively a fiction,” Altmann told the Globe. “If it’s the case the data have in fact been fabricated, which is what I as the editor infer, that is as serious as it gets."
FYI, the article in question was (I believe) Hauser, Weiss and Marcus, "Rule learning by cotton-top tamarins", Cognition 86 2002. And the graph that is "effectively a fiction" is this one:
The statistical analysis of apparently non-existent data would be this:
According to the information provided to Altmann, then, the videos and other materials in Hauser's lab covered only the Different trials, that is, cases where the animals were probed with a "grammatical" pattern different from the one that they had habituated to, either ABB or AAB. Records of the Same trials were systematically missing.
There's an interesting connection to the Science paper that we discussed at length here in 2004 (W. Tecumseh Fitch and Marc D. Hauser, "Computational Constraints on Syntactic Processing in a Nonhuman Primate", Science 303(5656):377-380, 16 January 2004). In that paper, the authors write:
Each grammar created structures out of two classes of sounds, A and B, each of which was represented by eight different CV syllables. The A and B classes were perceptually clearly distinguishable to both monkeys and humans: different syllables were spoken by a female (A) and a male (B) and were differentiated by voice pitch (> 1 octave difference), phonetic identity, average formant frequencies, and various other aspects of the voice source. For any given string, the particular syllable from each class was chosen at random.
At the time, I noted that taking the A and B classes from two very different voices — more than an octave apart in mean pitch, among other things — meant that there was essentially no meaning or value in the simultaneously-active restriction of A and B to two different classes of 8 syllables each. And similarly, the fact that the test materials involved syllable sequences not used in the familiarization trials was also meaningless, given that the animals needed only to attend to the (highly salient) different in pitch patterns.
But in the 2002 Cognition paper, the authors write:
We used the same material that Marcus and colleagues presented to 7-month-old infants in their third experiment. Specifically, subjects were habituated to either a sample of tokens matching the AAB pattern or the ABB pattern. These tokens consisted of CV syllables and were created with a speech synthesizer available at www.bell-labs.com/project/tts/voices-java.html. The 16 strings (“sentences” in Marcus et al.) available in the ABB corpus were: “ga ti ti”, “ga na na”, “ga gi gi”, “li na na”, “li ti ti”, “li gi gi”, “li la la”, “ni gi gi”, “ni ti ti”, “ni na na”, “ni la la”, “ta la la”, “ta ti ti”, “ta na na”, and “ta gi gi”; the AAB sentences were made out of the same CV syllables or “words”.
In other words, the A and B classes in the 2002 paper differed only in the patterns of syllables used, and not in a gross overall difference in pitch and vocal identity.
Why did the 2004 experiment add the strange (and linguistically unnatural) dimension of assigning different "grammatical" categories to different voices? Presumably because otherwise the task, though easy enough for human infants, is very hard for monkeys. Maybe internal to Hauser's lab, it was understood that it was not just hard, but essentially impossible.
Presumably there's also a connection to the lab-internal dispute discussed here:
… the experiment in question was coded by Mr. Hauser and a research assistant in his laboratory. A second research assistant was asked by Mr. Hauser to analyze the results. When the second research assistant analyzed the first research assistant's codes, he found that the monkeys didn't seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In other words, the experiment was a bust.
But Mr. Hauser's coding showed something else entirely: He found that the monkeys did notice the change in pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment was a big success.
The quoted article in the Chronicle of Higher Education notes that "The research that was the catalyst for the inquiry ended up being tabled, but only after additional problems were found with the data."
[Update — See also Carolyn Johnson, "Fabrication plausible, journal editor believes", Boston Globe 8/28/2010; and Sofia Groopman and Naveen Srivatsa, "Despite Scandal, Hauser To Teach at Harvard Extension School", Harvard Crimson 8/27/2010; and Eric Felten, "Morality Check: When Fad Science is Bad Science", WSJ 7/27/2010; and Greg Miller, "Hausergate: Scientific Misconduct and What We Know We Don't Know", Science 7/25/2010.]