{"id":21956,"date":"2015-10-31T07:30:31","date_gmt":"2015-10-31T12:30:31","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=21956"},"modified":"2015-11-05T07:44:55","modified_gmt":"2015-11-05T12:44:55","slug":"replicability-vs-reproducibility-or-is-it-the-other-way-around","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=21956","title":{"rendered":"Replicability vs. reproducibility &#8212; or is it the other way around?"},"content":{"rendered":"<p>The term <strong><em>reproducible research<\/em><\/strong>, in its <a href=\"https:\/\/scholar.google.com\/scholar?hl=en&amp;q=&quot;reproducible+research&quot;\" target=\"_blank\">current sense<\/a><strong><em>,\u00a0<\/em><\/strong>was<strong><em>\u00a0<\/em><\/strong>coined about 1990 by the geophysicist Jon Claerbout.<strong><em>\u00a0\u00a0<\/em><\/strong>Thus Jon Claerbout &amp; Martin Karrenbach, \"<a href=\"http:\/\/sepwww.stanford.edu\/doku.php?id=sep:research:reproducible:seg92\" target=\"_blank\">Electronic Documents Give <strong>Reproducible Research<\/strong> a New Meaning<\/a>\", <em>Society of Exploration Geophysics<\/em> 1992<em>\u00a0[emphasis added, here and throughout]<\/em>:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">A revolution in education and technology transfer follows from the marriage of word processing and software command scripts. In this marriage an author attaches to every figure caption a pushbutton or a name tag usable to recalculate the figure from all its data, parameters, and programs. This provides a concrete definition of <strong>reproducibility<\/strong> in computationally oriented research. Experience at the Stanford Exploration Project shows that preparing such electronic documents is little effort beyond our customary report writing; mainly, we need to file everything in a systematic way. [&#8230;]<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">The principal goal of scientific publications is to teach new concepts, show the resulting implications of those concepts in an illustration, and provide enough detail to make the work reproducible. In real life, <strong>reproducibility<\/strong> is haphazard and variable. Because of this, we rarely see a seismology PhD thesis being redone at a later date by another person. In an electronic document, readers, students, and customers can readily verify results and adapt them to new circumstances without laboriously recreating the author's environment.<\/span><\/p>\n<p><!--more--><\/p>\n<p>I organized a <a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/ldc\/Berlin6Session5\/Overview.html\" target=\"_blank\">session<\/a>\u00a0on <a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=830\" target=\"_blank\">\"<strong>Reproducible Research<\/strong>\"<\/a> at the <a href=\"http:\/\/openaccess.mpg.de\/82792\/zoom.jpg\" target=\"_blank\">Berlin 6 Open Access Conference<\/a>\u00a0in 2008; and Victoria Stodden organized a session entitled \"<a href=\"https:\/\/aaas.confex.com\/aaas\/2011\/webprogram\/Session3166.html\" target=\"_blank\">The Digitization of Science: <strong>Reproducibility<\/strong> and Interdisciplinary Knowledge Transfe<\/a>r\" at AAAS 2011 (LLOG coverage <a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=2976\" target=\"_blank\">here<\/a>).<\/p>\n<p>Because research in Claerbout's lab mainly involved analysis of published seismological recordings collected and published by the USGS, the idea of re-doing an experiment by collecting new data\u00a0didn't ordinarily\u00a0arise &#8212; the closest thing would be what that paper calls \"adapting results to new circumstances\". And much the same situation obtains in other areas\u00a0where the goal is to model or explore large shared datasets, as is the case in most modern research in computational linguistics.<\/p>\n<p>But in many\u00a0other fields, it's natural to wonder whether an experiment would work if someone else tried to follow a similar recipe from start to finish. So at\u00a0some point between 1990 and 2006, people in this tradition\u00a0began using terms in the word family \u00a0<strong><em>replication replicable replicability<\/em><\/strong> to refer to the (traditional) process of completely re-running an experiment, with\u00a0all the effects of new researchers, new equipment, new subjects or other raw materials, etc.\u00a0Thus Roger Peng et al., \"<a href=\"http:\/\/aje.oxfordjournals.org\/content\/163\/9\/783.short\" target=\"_blank\"><strong>Reproducible<\/strong> Epidemiologic Research<\/a>\", <em>American Journal of Epidemiology<\/em> 2006:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">The <strong>replication<\/strong> of important findings by multiple independent investigators is fundamental to the accumulation of scientific evidence. Researchers in the biologic and physical sciences expect results to be <strong>replicated<\/strong> by independent data, analytical methods, laboratories, and instruments. Epidemiologic studies are commonly used to quantify small health effects of important, but subtle, risk factors, and replication is of critical importance where results can inform substantial policy decisions. However, because of the time, expense, and opportunism of many current epidemiologic studies, it is often impossible to fully <strong>replicate<\/strong> their findings. An attainable minimum standard is \u201c<strong>reproducibility<\/strong>,\u201d which calls for data sets and software to be made available for verifying published findings and conducting alternative analyses. The authors outline a standard for <strong>reproducibility<\/strong> and evaluate the <strong>reproducibility<\/strong> of current epidemiologic research. They also propose methods for <strong>reproducible research<\/strong> and implement them by use of a case study in air pollution and health.<\/span><\/p>\n<p>For another example of the same terminological tradition, see \"<a href=\"http:\/\/simplystatistics.org\/2012\/04\/18\/replication-psychology-and-big-science\/\" target=\"_blank\">Replication, psychology, and Big Science<\/a>\", <em>Simply Statistics<\/em> 2012:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">A study is <strong>reproducible<\/strong> if there is a specific set of computational functions\/analyses (usually specified in terms of code) that exactly reproduce all of the numbers in a published paper from raw data.\u00a0It is now recognized that a critical component of the scientific process is that data analyses can be reproduced. This point has been driven home particularly for personalized medicine applications, where irreproducible results\u00a0<a style=\"color: #800000;\" href=\"http:\/\/www.nature.com\/news\/lapses-in-oversight-compromise-omics-results-1.10298?nc=1332884191164\" target=\"_blank\">can lead to delays<\/a>\u00a0in evaluating new procedures that affect patients\u2019 health.\u00a0<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">But just because a study is <strong>reproducible<\/strong> does not mean that it is\u00a0<strong><em>replicable<\/em><\/strong>. <strong>Replicability<\/strong> is stronger than <strong>reproducibility<\/strong>. A study is only <strong>replicable<\/strong> if you perform the exact same experiment (at least) twice, collect data in the same way both times, perform the same data analysis, and arrive at the same conclusions. The difference with <strong>reproducibility<\/strong> is that to achieve <strong>replicability<\/strong>, you have to perform the experiment and collect the data again. This of course introduces all sorts of new potential sources of error in your experiment (new scientists, new materials, new lab, new thinking, different settings on the machines, etc.)<\/span><\/p>\n<p>And there's a substantial and growing literature on (computational and social) methods for achieving <strong>reproducibility<\/strong> in Claerbout's sense, and <strong>replicability<\/strong> in Peng's sense.\u00a0An important recent survey of the movement\u00a0is Victoria Stodden et al., Eds., <strong><em><a href=\"https:\/\/books.google.com\/books?hl=en&amp;lr=&amp;id=JcmSAwAAQBAJ\" target=\"_blank\">Reproducible Research<\/a><\/em><\/strong>, Taylor &amp; Francis 2014:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">Science moves forward when discoveries are <strong>replicated<\/strong> and <strong>reproduced<\/strong>. In general, the more frequently a given relationship is observed by independent scientists, the more trust we have that such a relationship truly exists in nature. <strong>Replication<\/strong>, the practice of independently implementing scientific experiments to validate specific findings, is the cornerstone of discovering scientific truth. Related to <strong>replication<\/strong> is <strong>reproducibility<\/strong>, which is the calculation of quantitative scientific results by independent scientist using the original datasets and methods. <strong>Reproducibility<\/strong> can be thought of as a different standard of validity because it forgoes independent data collection and uses the methods and data collected by the original investigator. <strong>Reproducibility<\/strong> has become an important issue for more recent research due to advances in technology and the rapid spread of computational methods across the research landscape.<\/span><\/p>\n<p>Clear enough, right?<\/p>\n<p>But more recently, some researchers have started using\u00a0the same terms with the reference more or less switched. I think that this confusion originates with\u00a0Chris Drummond, \"<a href=\"http:\/\/www.csi.uottawa.ca\/~cdrummon\/pubs\/ICMLws09.pdf\" target=\"_blank\"><strong>Replicability<\/strong> is not <strong>Reproducibility<\/strong>: Nor is it Good Science<\/a>\", ICML 2009:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">At various machine learning conferences, at various times, there have been discussions arising from the inability to <strong>replicate<\/strong> the experimental results published in a paper. There seems to be a wide spread view that we need to do something to address this problem, as it is essential to the advancement of our field. The most compelling argument would seem to be that <strong>reproducibility<\/strong> of experimental results is the hallmark of science. Therefore, given that most of us regard machine learning as a scientific discipline, being able to <strong>replicate<\/strong> experiments is paramount. I want to challenge this view by separating the notion of <strong>reproducibility<\/strong>, a generally desirable property, from <strong>replicability<\/strong>, its poor cousin. I claim there are important differences between the two. <strong>Reproducibility<\/strong> requires changes; <strong>replicability<\/strong> avoids them. Although <strong>reproducibility<\/strong> is desirable, I contend that the impoverished version, <strong>replicability<\/strong>, is one not worth having.<\/span><\/p>\n<p>And Drummond's confusion has been picked up a few others &#8212; e.g. Arturo Casadevall &amp; Ferric Fang, \"<a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2981311\/\" target=\"_blank\"><strong>Reproducible<\/strong> Science<\/a>\", <em>Infection and Immunity<\/em> 2010:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">Although many biological scientists intuitively believe that the <strong>reproducibility<\/strong> of an experiment means that it can be <strong>replicated<\/strong>, Drummond makes a distinction between these two terms. Drummond argues that <strong>reproducibility<\/strong> requires changes, whereas <strong>replicability<\/strong> avoids them. In other words, <strong>reproducibility<\/strong> refers to a phenomenon that can be predicted to recur even when experimental conditions may vary to some degree. On the other hand, <strong>replicability<\/strong> describes the ability to obtain an identical result when an experiment is performed under precisely identical conditions.<\/span><\/p>\n<p>Or Thilo Mende, \"<a href=\"http:\/\/dl.acm.org\/citation.cfm?id=1868336\" target=\"_blank\"><strong>Replication<\/strong> of Defect Prediction Studies: Problems, Pitfalls and Recommendations<\/a>\", PROMISE 2010:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">In the early days, [&#8230;]\u00a0most prediction models were based on proprietary data, thus preventing independent <strong>replication<\/strong>. With the rise of the PROMISE repository1 , this situation has changed. This repository collects publicly available data sets, the majority of them for the task of defect prediction. Currently, there are more than 100 such data sets inside the PROMISE repository, and many more are made available elsewhere. <\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">This trend is very beneficial, as it enables researchers to independently verify or refute previous results. Drummond argues that <strong>replication<\/strong> \u2014 the repetition of an experiment without any changes \u2014 is not worthwhile. He favors <strong>reproducing<\/strong> experiments with changes, since only this adds new insights. While we agree that the pure <strong>replication<\/strong> of experiments on the same data sets should not lead to new results, we argue that <strong>replicability<\/strong> is nevertheless important: When applying previously published procedures to new data sets, or new procedures to well-known data sets, researchers should be able to validate their implementations using the originally published results.<\/span><\/p>\n<p>This confusion seems to have led some researchers to reject the whole distinction &#8212; e.g. Brian Nosek, \"<a href=\"http:\/\/pps.sagepub.com\/content\/7\/6\/657.full\" target=\"_blank\">An Open, Large-Scale, Collaborative Effort to Estimate the <strong>Reproducibility<\/strong> of Psychological Science<\/a>\", <em>Perspectives on Psychological Science<\/em> 2012:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003300;\">Some distinguish between \u201c<strong>reproducibility<\/strong>\u201d and \u201c<strong>replicability<\/strong>\u201d by treating the former as a narrower case of the latter (e.g., computational sciences) or vice versa (e.g., biological sciences). We ignore the distinction.<\/span><\/p>\n<p>(As the citations in this post suggest, and as a little poking around in Google Scholar will confirm, Nosek's notion that\u00a0this is a difference between Computer Science and Biology is false. As far as I can tell, it's a difference between people influence by Drummond's provocative but deeply confused article, and everybody else in a dozen different fields &#8212; though maybe there has been some independent invention of related confusions as well.)<\/p>\n<p>It seems to me that<\/p>\n<ol>\n<li>Under whatever names, the distinction between <strong>replicability<\/strong> and <strong>reproducibility<\/strong> is worth preserving (and indeed extending &#8212; see below);<\/li>\n<li>Since the technical term \"<strong>reproducible research<\/strong>\" has been in use since 1990, and the technical distinction between <strong>reproducible<\/strong> and <strong>replicable<\/strong> at least since 2006, we should reject Drummond's 2009 attempt to\u00a0re-coin\u00a0technical terms <strong>reproducible<\/strong> and <strong>replicable<\/strong>\u00a0in senses that assign the terms to concepts nearly opposite to those used in the definitions\u00a0by Claerbout, Peng and others.<\/li>\n<\/ol>\n<p>Why preserve\u00a0the distinction in an extended or elaborated form? Because there are many variations on the theme, all of them sometimes worthwhile. We might re-apply the original\u00a0computational analysis\u00a0to the original data, perhaps to\u00a0check unreported aspects of the method or the results; we might re-implement the model or algorithm and apply it to the original data, to test a new program or to check for coding or algorithmic errors; we might apply the original computational analysis to new data, meant to test exactly the same hypotheses; we might apply a different model or algorithm to the original data, as an independent\u00a0test of the original hypothesis, or in support of a different one; we might apply the original computational analysis to new data focused on an analogous but systematically different set of questions; and so on. Perhaps the most common and most valuable variation is benchmarking alternative models or algorithms with respect to the same quantitative evaluative metric on the same training and testing material. All of these varied responses to a publication are consistent with Jon Claerbout's original vision, in my opinion, though they go far beyond\u00a0\u00a0simply \"[attaching] to every figure caption a pushbutton or a name tag usable to recalculate the figure from all its data, parameters, and programs\".<\/p>\n<p>For many\u00a0reasons, I think that Drummond is profoundly wrong on the substance. But even if you believe his assertion that\u00a0\"Although X\u00a0is desirable, I contend that the impoverished version, Y, is one not worth having\", you should reject his attempt to swap the reference\u00a0of the terms X and Y, substituting <em><strong>reproducibility<\/strong><\/em> for what many others have been calling <em><strong>replicability<\/strong><\/em>, and vice versa.<\/p>\n<p>Some previous LLOG posts on related topics:\u00a0\u00a0\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=830\" target=\"_blank\">Reproducible research<\/a>\" (11\/14\/2008); \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=2976\" target=\"_blank\">Reproducible Science at AAAS 201<\/a>1\" (2\/18\/2011); \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=10693\" target=\"_blank\">Literate programming and reproducible research<\/a>\" (2\/22\/2014); and \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=17916\" target=\"_blank\">Reliability<\/a>\" (2\/28\/2015).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The term reproducible research, in its current sense,\u00a0was\u00a0coined about 1990 by the geophysicist Jon Claerbout.\u00a0\u00a0Thus Jon Claerbout &amp; Martin Karrenbach, \"Electronic Documents Give Reproducible Research a New Meaning\", Society of Exploration Geophysics 1992\u00a0[emphasis added, here and throughout]: A revolution in education and technology transfer follows from the marriage of word processing and software command scripts. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[8],"tags":[],"class_list":["post-21956","post","type-post","status-publish","format-standard","hentry","category-the-language-of-science"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/21956","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21956"}],"version-history":[{"count":23,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/21956\/revisions"}],"predecessor-version":[{"id":22052,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/21956\/revisions\/22052"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21956"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21956"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}