Reproducible Science at AAAS 2011
« previous post | next post »
I'm at the AAAS 2011 meeting in DC, mainly because as chair-elect of Section Z (Linguistics) I'm duty-bound to be here, but also partly because I'm giving a talk in a symposium tomorrow afternoon on "The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer". The session was organized by Victoria Stodden, and this is its abstract:
Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis affecting many scientific fields. It is impossible to verify most of the results that computational scientists present at conferences and in papers today. Reproducible computational research, in which all details of computations — code and data — are made conveniently available to others, is a necessary response to this crisis. This session addresses reproducible research from three critical vantage points: the consequences of reliance on unverified code and results as a basis for clinical drug trials; groundbreaking new software tools for facilitating reproducible research and pioneered in a bioinformatics setting; and new survey results elucidating barriers scientists face in the practice of open science as well as proposed policy solutions designed to encourage open data and code sharing. A rapid transition is now under way — visible particularly over the past two decades — that will finish with computation as absolutely central to scientific enterprise, cutting across disciplinary boundaries and international borders and offering a new opportunity to share knowledge widely.
Victoria Stodden's blog, though not updated frequently, is worth reading. One post that I especially enjoyed was a discussion of HackNY, a summer program aiming to "'get the kids off the street' by giving them alternatives to entering the finance profession".
My own contribution to today's symposium is "Lessons for Reproducible Science from the DARPA Speech and Language Program":
Since 1987, DARPA has organized most of its speech and language research in terms of formal, quantitative evaluation of computational solutions to well-defined "common task" problems. What began as an attempt to ensure against fraud turned out to be an extraordinarily effective way to foster technical communication and to explore a complex space of problems and solutions. This engineering experience offers some useful (if partial) models for reproducible science, especially in the area of data publication; and it also suggests that the most important effects may be in lowering barriers to entry and in increasing the speed of scientific communication.
Update — Victoria Stodden has put the slides and other links from the presentations up on the web here.
The first talk in the session offered an especially vivid explanation of why "reproducible research" is a consequential slogan: Keith Baggerly, "The Importance of Reproducible Science in High-Throughput Biology: Case Studies". His abstract:
High-throughput biological assays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for reproduction, leading to exercises in “forensic bioinformatics” where raw data and reported results are used to infer what the methods must have been. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors.
In this talk, we examine several related papers using array-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials were allocated to treatment arms based on these results. However, we show in several case studies that the reported results incorporate several simple errors that could put patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We briefly discuss steps we are taking to avoid such errors in our own investigations.
The presentation was chilling. For an earlier version of the same talk — with a video of the lecture synchronized to his slides — see here. (This is also a good example of what, in my opinion, the AAAS should do with the presentations in the >150 symposiums in each annual meeting, in place of the high-1980s technology of selling audio CDs.)
Twitter Trackbacks for Language Log » Reproducible Science at AAAS 2011 [upenn.edu] on Topsy.com said,
February 18, 2011 @ 7:04 am
[…] Language Log » Reproducible Science at AAAS 2011 languagelog.ldc.upenn.edu/nll/?p=2976 – view page – cached February 18, 2011 @ 6:05 am · Filed by Mark Liberman under The language of science […]
Gunnar said,
February 18, 2011 @ 7:04 am
Interesting. For those of us not in DC, is there any way to access notes from or a transcript of the symposium? Or alternatively, are there published articles where the participants present substantially the same things?
[(myl) Some similar themes, from a different set of participants, can be found in a session at the Berlin 6 OA conference, discussed here with some links to presentations and other materials. I'll add some notes on today's session, as well as links to relevant publications by the participants. And finally, I understand that the AAAS, in a dazzling display of 1980s technology, records its symposia and allows interested parties to buy CDs of the recordings. I'll provide more details when I have them.]
John Roth said,
February 18, 2011 @ 11:35 am
This is not just laudable, I think it's long overdue. As a software developer, I've seen the occasional scathing comment about code quality for some of these efforts. I suspect that publication and outside review might be a bit painful at first, but over the long haul it can't be anything other than beneficial.
I'm encouraged by what's happening in the genetics area with publicly available tools. There are several grass roots projects that are using public data, as well as contributed genome data, to analyze the genomics of various groups.
peterm said,
February 19, 2011 @ 3:33 am
Victoria Stodden's blog has a letter which she and Mark Gerstein sent to the journal Nature, which included the following text:
"The approach taken by the journal Biostatistics serves as an exemplar: code and data are submitted to a designated “reproducibility editor” who tries to replicate the results. If he or she succeeds, the first page of the article is kitemarked “R” (for reproducible) and the code and data made available as part of the publication."
This proposal strikes me as profoundly missing the point. The very place where much science happens and where scientific progress is made is in the contestation of methods, and in argument and cogitation about replication and replicability. In my research experience, replication of computer simulations is rarely if ever something done (or not done) once and for all by someone running a program once, but a long, detailed, and often problematic process involving back-and-forth dialog between the various protagonists and other interested parties, considerable simulation trial and error, and significant thought about the domain, about theoretical models for it, and about their representation through computer models. None of this is something I could imagine being possible to delegate to a journal editor, were it even desirable to do so.
John Roth said,
February 19, 2011 @ 11:30 am
@peterm
The point is, I think, transparency. The question being asked is not whether the material submitted is scientifically relevant, but whether it actually runs so that someone else has a reasonable expectation that they can examine it, run it and get the same output. That's the starting point for a scientifically interesting critique, not the end point.
Your other point about the long, detailed and problematic process of arriving at a satisfactory model goes into a highly contentious area of appropriate code documentation. I imagine this will take a while to thrash out, and might eventually wind up with something like Knuth's Literate Programming where the program documentation is actually a tutorial that explains the decisions taken and alternatives considered and rejected.
Ref: http://www-cs-faculty.stanford.edu/~uno/lp.html
The Blackboard » Guess the field? said,
February 19, 2011 @ 11:53 am
[…] readers to guess the types of topics ordinarily discussed at the blog where I found the quote. Written by lucia. Previous Post: […]
Peter Taylor said,
February 19, 2011 @ 5:58 pm
@peterm, given that one of the things we learnt in the hoo-hah over UAE and climate change was that they hadn't documented their process well enough to reproduce their own results, it hardly seems to be missing the point to require them to prove at least that minimal level of reproducibility.
Barbara Phillips Long said,
February 20, 2011 @ 11:22 pm
Baggerly's comment about errors and risk interested me:
"However, we show in several case studies that the reported results incorporate several simple errors that could put patients at risk."
This comment reminds me forcibly of Atul Gawande's "The Checklist Manifesto," which seeks to avert other types of medical errors that are preventable. Researchers, as well as physicians, should seriously consider the case he makes for human fallibility.
Preventing errors in any field of research seems likely to make the research more credible for laymen, too.
Fernando Perez said,
February 22, 2011 @ 3:28 am
@Gunnar, Victoria's page about the symposium (http://www.stanford.edu/~vcs/AAAS2011/) now has all the slides and audio files from the talks (video wasn't filmed). I also posted on my blog (http://blog.fperez.org/2011/02/reproducible-research-at-aaas-2011.html, even less frequently updated than Victoria's) expanding on some of the topics which I discussed during my talk.
Lane Schwartz said,
February 22, 2011 @ 8:51 am
Thanks for the excellent post. Reproducibility is at the heart of science – if our publications do not detail methods (including code used) in sufficient detail to allow exact reproduction of results, what we are doing is simply not science.
I highly recommend this article on the topic from the Computational Linguistics journal by Ted Pedersen of the University of Minnesota, Duluth:
http://www.aclweb.org/anthology-new/J/J08/J08-3010.pdf
Nick Barnes said,
March 3, 2011 @ 8:03 am
@peterm: The point of a reproducibility editor and kitemark is not to offload any of the work of method development which you describe. Authors still have to choose and develop appropriate numerical/computational methods for their work, and this will still involve exactly the same sort of back-and-forth. The point of the editor and kitemark is to ensure that those methods, as they stand when the authors deem their results fit for publication, are fully and accurately documented and preserved.
These days – when computer work is often complex and involved – publications very rarely contain enough information to reproduce numerical methods. Often published descriptions are inadequate or misleading, sometimes they are inaccurate. The editor and kitemark addresses these points. The code may also be buggy (it probably is: most programs are) and is often lost so that even the authors cannot reproduce or refine their analysis in future. Making the code available, in a publication repository or elsewhere, addresses that.
I recently wrote up a simple case study at http://climatecode.org/blog/2011/03/why-publish-code-a-case-study/