Sproat asks the question
« previous post | next post »
The "Last Words" segment of the latest issue of Computational Linguistics is by Richard Sproat: "Ancient Symbols, Computational Linguistics, and the Reviewing Practices of the General Science Journals". Richard reviews and extends the analysis (partly contributed by him and by Cosma Shalizi) in "Conditional entropy and the Indus Script" (4/26/2009) and "Pictish writing?" (4/2/2010), and poses the question that I was too polite to ask:
How is it that papers that are so trivially and demonstrably wrong get published in journals such as Science or the Proceedings of the Royal Society?
As Richard notes, Fernando Pereira made a similar point in "Falling for the magic formula", 4/26/2009:
Once again, Science falls for a magic formula that purports to answer a contentious question about language: is a certain ancient symbolic system a writing system. They would not, I hope, fall for a similar hypothesis in biology. They would, I hope, be skeptical that a single formula could account for the complex interactions within an evolutionarily built system. But somehow it escapes them that language is such a system, as are other culturally constructed symbol systems, carrying lots of specific information that cannot be captured by a single statistic.
This is not the first time that I've wondered, silently, about the reviewing standards of journals like Science and Nature in the fields where I have some expertise. I have mixed feelings about this, in part because I'm happy to see language-related matters get any coverage at all in such venues, and in part because I think that the peer-review process is generally broken, and also is no longer very well suited to the circumstances of modern scholarly and scientific communication. But still…
Richard Sproat said,
September 17, 2010 @ 11:08 am
With that paper, Rao et al.'s (and Lee et al.'s) response, and my reply to their responses, things seem to have heated up a bit again in blogland:
http://horadecubitus.blogspot.com/2010/09/indus-argument-continues.html
I'm not sure what to say about some of the points that Siddharthan makes here. My point that one could interpret Rao et al.'s original claims about the role of the entropic evidence in various ways — inductive, deductive — would appear to be valid. And it would also appear to be valid to point out that they should have made it clear at the outset which of those interpretations was the one intended. For some reason Siddharthan does not seem to accept that argument.
But this may be due to a more general instance of misunderstanding. For example he elsewhere notes: "On the first page, he clarifies that he was talking only of the Science paper and has not read the more recent papers by Rao and colleagues". Did I say I hadn't *read* them? I don't think I said that.
The one interesting thing in Siddharthan's blog is that Rao finally gives us an explanation for why they did not include Fortran in Figure 1A in their original Science paper:
"By the way, the reason that Fortran was included in Fig 1B rather than 1A is quite mundane: a reviewer asked us to compare DNA, proteins, and Fortran, and we included these in a separate plot in the revised version of the paper. Just to prove we didn't have any nefarious designs, I have attached a plot that Nisha Yadav created that includes Fortran in the Science Fig 1A plot. The result is what we expect."
They then show a rather convincing looking version of 1A that includes Fortran:
http://2.bp.blogspot.com/_xHva9tn5xBY/TJM0Tg4m6LI/AAAAAAAAAVQ/hz4oaEhj1Jc/s1600/SciencePlotWithFortran.jpg
But this is puzzling: it's so convincing looking, why would they *not* have updated Figure 1A to include this? After all, it would take no more space.
Also, unless I'm missing something, it seems puzzling that Fortran looks *more* different from "language" with the bigram entropy calculation above, than it does in the more recent block entropy calculations that depend on wider context. See (b) in the following figure:
http://www.cs.washington.edu/homes/rao/IndusCompLing_files/image003.gif
I have to think this through more: maybe it makes sense. But on the face of it it seems surprising.
Heather W. said,
September 17, 2010 @ 11:14 am
A preprint of Rao et al's response to Sproat's Last Word article is available here: http://www.cs.washington.edu/homes/rao/IndusCompLing.pdf
I'm not qualified to evaluate their new arguments, but I thought might be of interest to the folks here.
Bob Lieblich said,
September 17, 2010 @ 11:31 am
I had the experience, many years ago, of reoresenting one of two sides in a contentious litigation. It was concluded by a fortuitous event, almost a deus ex machina, but the legal dispute lingered unresolved. A few months after the resolution, a scholarly legal periodical printed an article that was an almost verbatim copy of the other side's legal brief in the case. Most of its "conclusions" were simply one side of an ongoing legal argument. Had they been couched as issues awaiting resolution, that would have been fine, but they were presented as objective truth.
I knew the editor of the journal, so I called him and asked if he wanted to hear from the other side. He agreed, but I decided to let that sleeping dog lie. He called a few months later to ask why I had never submitted anything. After explaining, I asked why he had published the original atricle, and he admitted that he was desperate for material and would publish almost anything within the scope of his publication. He was the only person to have vetted tmost of the published articles, and in the case of the one I'm talking about knew relatively little about the subject matter.
It's a long way from law to linguistics, at least as academic disciplines, but I wonder to what extent the need to fill pages relates to the quality of what makes it into print.
John Lawler said,
September 17, 2010 @ 11:38 am
I don't know, Bob. Science probably rejects at least 90% of what's submitted; I know that's true of Language, which is a less prestigious place to publish (except for linguists in the know).
Richard Sproat said,
September 17, 2010 @ 11:44 am
I doubt that this has anything to do with the need to fill pages. Science (and the Proceedings of the Royal Society) had every reason to believe that these papers would produce a sensation in the popular press. And they were right. Rao's paper got wide coverage, especially in the Indian media. Lee et al.'s paper was less widely covered only because Pictish symbols don't have the poignancy for a lot of people that the Indus Valley civilization apparently has for many; but it got coverage nonetheless.
It seems clear to me that the expectation that these papers would stir controversy was a major motivating factor.
Bob Lieblich said,
September 17, 2010 @ 11:55 am
Thank you, gentlemen. I stand corrected. I infer from this glut of papers on the scientific side, and the dearth on the legal side, that the country is awash in scientists and needs more lawyers.
David L said,
September 17, 2010 @ 12:26 pm
I can't comment on these papers — haven't read them, and I'm not a linguist anyway — but I can say something about editorial processes at journals such as Science and Nature, having once been in that line of work.
The central issue, as Richard Sproat surmises in his commentary, is that these journals are always eager to publish papers that are a little bit outside their usual specialties, because it bolsters their image as general rather than specialist publications, but when any such paper shows up, it's hard for the editors to deal with, because they don't have any familiarity with the field. It's also true that they have to be aware of their audience. A hardcore linguistics paper would be tough, because hardcore linguists (correct me if I'm wrong) don't normally look in Science and Nature for hot research in their field. But a paper that mixes a bit of linguistics, a bit of archaeology, a bit of computational theory, a bit of statistical mechanics — that seems like something that would appeal to a wide range of readers.
It's also the case, I imagine that hardcore linguists are not likely to think of sending papers to Science and Nature – and the journals can only publish what's submitted to them.
I don't doubt that the Science editors did their best to review the paper in question diligently. It would be nice to think that a careful reviewer would admit that he or she didn't really understand some part of the argumentation, and suggest some other expert reviewer, but alas that doesn't happen as often as one might hope.
As to what you (linguists, I mean) could do about this — get in touch with the journal and offer to write a review/perspective/news&views piece on computational linguistics. If you present it as something intended to explain your specialty to a scientific audience who may not know of your existence, the journal might well be interested.
Jerry Friedman said,
September 17, 2010 @ 1:04 pm
@MYL: The peer-review process fulfills its function excellently: reviewers' papers get cited.
I've wondered about trivially wrong papers in physics. The only paper my name is on in a traditional refereed journal contains a mistake in arithmetic (though it's not crucial to the argument). The time I helped my adviser referee a paper, I didn't check details of the calculations and I don't remember him mentioning that he'd done so. In physics, biology, and English, I've seen accepted papers that contained what I had no doubt were misunderstandings, erroneous arguments, and ignorance of basic principles.
Brett said,
September 17, 2010 @ 1:26 pm
The top scientific journals, Science and Nature, publishing far more biology and biochemistry than they do anything else. The quality control in those areas is pretty good. However, among scientists outside biology and chemistry, it is almost a truism that virtually the only way to get published in these journals is to make a very sweeping claim about that is understandable to people with only an undergraduate background in the subject. Publishing in these journals is extremely prestigious, and this creates a huge incentive for each submitted manuscript to overstate the importance of a given result and to limit discussion of possible caveats to the conclusions. (The length restrictions also contribute to this last point.) At best, the resulting publications are woefully short on explanation, but quite frequently, they present potentially quite interesting results, grossly over-interpreted and extrapolated beyond reasonable limits.
Karen said,
September 17, 2010 @ 2:14 pm
This is not the first time that I've wondered, silently, about the reviewing standards of journals like Science and Nature in the fields where I have some expertise.
This observation is the sort of thing that makes me wonder, silently perhaps!, about the standards in the fields where I don't have expertise. Trivially, an article highly critical about some aspects of the nuclear power plants in my home town spoke of "the" county it lies in and "the three high schools". As there is only one high school, and the city sprawls out over much of two counties, I was forced to wonder if anything the writer said about the more esoteric elements of his story were any more accurate than these easily checked statements were…
Nicholas Waller said,
September 17, 2010 @ 2:30 pm
@Karen – wandering a bit OT, but the this-easily-checkable-trivially-wrong-thing-that-I-know-is-wrong-so-I-wonder-what-else-is element is relevant also to fiction, about which Charles Stross is currently hosting a blogpost discussion about.
Nadia T. said,
September 17, 2010 @ 2:58 pm
"Would a paper that made some blatantly wrong claim about genetics be published in such venues?" I don't know of an example actually in genetics, but describing a new species, seems, at least to me, to be related. Surely a general editor can't claim that they are unaware that specialists on describing species exist; they are sometimes disparagingly referred to as "taxonomists". The first part of the sorry story of how Nature published the name "Leviathan" without checking whether the name had already been used, is detailed at:
http://svpow.wordpress.com/2010/06/30/is-the-new-miocen-sperm-whale-leviathan-validly-named/
and the latter part at:
http://en.wikipedia.org/wiki/Livyatan_melvillei
Richard Sproat said,
September 17, 2010 @ 3:21 pm
"Would a paper that made some blatantly wrong claim about genetics be published in such venues?"
Soon after I finished writing my Last Words piece, the following article came out in the New York Times:
http://www.nytimes.com/2010/07/09/science/09age.html?_r=2&hpw
This was particularly amusing since it was a nice counterexample to my assumption.
ano said,
September 17, 2010 @ 3:31 pm
I think you're being unfair. As far as I can tell, the paper of Rao et al does not claim to "prove" that the Indus script is writing; only that the entropy test provides "evidence for the linguistic hypothesis by showing that the script’s conditional entropy is closer to those of natural languages than various types of nonlinguistic systems" — that is, the Indus script satisfies a basic sanity check for languages, which (at least very slightly) increases our strength of belief that it is language. Yes, it would be good if they had analysed more man-made non-linguistic systems, but their paper is fine (if weak) as it is: it seems that the only known non-linguistic systems with similar characteristics are those obtained by tuning a parameter, not ones that actually occur. Sure the paper may be too weak for Science, but it is by no means "trivially and demonstrably wrong" as Sproat asserts.
In contrast, Sproat&co.'s screed and ad hominem attacks, as well as the unjustified claims in their paper (starting with its title: "The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization") are the ones that should bear more scrutiny. The blog post to which Sproat linked in the first comment (in a rare display of fairplay) is worth reading.
Mark F. said,
September 17, 2010 @ 4:40 pm
What is the alternative to peer review? ArXiv? But suppose you're running a conference? If you want a systematic way to recommend a selection out of all the research that is being done, I don't see any alternative to asking people to make the recommendations.
[(myl) Like democracy, peer review is the worst imaginable system, except for all the alternatives.
But seriously, the problems with peer review are well know: some reviewers say "yes" as long as they approve of the affiliations and conclusions of the authors, and the paper has the general appearance of a respectable publication in the field in question; other reviewers say "no" if they don't approve of the authors or if they and their friends are not adequately cited; others nit-pick details of method or interpretation way beyond the point of diminishing returns; some reviewers are cheerleaders for their discipline while others are more relentlessly critical the closer the work is to their own area; etc.
And the whole process was basically designed to deal with a situation in which publication space (typesetting and paper and distribution costs) was a critically scarce commodity. This is obviously no longer true, and the long publication delays and space constraints of standard journals slow down the rate of communication and innovation unnecessarily.
This is not to say that critical evaluation is unnecessary. On the contrary, one of the current problems is that the evaluation papers get is (a) not critical enough, and (b) mostly secret. Combine that with delays of a year or 18 months before publication, and you have (IMHO) a seriously broken system. How to make it better? Well, there are various ideas around, but ArXiv trackbacks are certainly one interesting step (out of many possible innovations).]
Richard Sproat said,
September 17, 2010 @ 6:50 pm
@ano:
"I think you're being unfair. As far as I can tell, the paper of Rao et al does not claim to "prove" that the Indus script is writing; only that the entropy test provides "evidence for the linguistic hypothesis by showing that the script’s conditional entropy is closer to those of natural languages than various types of nonlinguistic systems" — that is, the Indus script satisfies a basic sanity check for languages, which (at least very slightly) increases our strength of belief that it is language. "
Read my CL piece, and see what I actually say. I don't say there that they claim that entropy proves the case. I question whether entropy provides the additional evidence they claim it does. In my reply to their response to my piece (see http://www.cslu.ogi.edu/~sproatr/newindex/response.pdf) I elaborate on this point, and describe a couple of ways in which one could interpret their claims, and evaluate whether under either of those claims, the entropic evidence tells one anything.
I don't know how to evaluate the objection to parameter tuning, which I find a puzzler — though hardly a new puzzler since I've seen this one before. If a memoryless model can "look" linguistic according to a measure that is supposed to tell one there is meaningful structure, then I fail to see the validity of the objection that one has to tune a parameter of the model to get a match. In any event, as I point out in my piece, there *are* non-linguistic systems — heraldry — as well as memoryless systems — artificially generated Indus "texts" that have the same unigram frequencies, but (obviously) not the bigram dependencies — that look also linguistic. Neither of those involved tuning any parameters.
The term "screed" seems to be a popular one to apply to the Farmer et al 2004 paper, suggesting a kind of party-line view on this whole issue. That's to be expected. But in any case, our work *has* received scrutiny: certainly there are no shortage of people who have argued against it: see the bibliography to Rao's response for a list of such papers. One can evaluate our arguments and those of the opponents, and see which ones one finds more convincing.
Rahul Siddharthan said,
September 18, 2010 @ 8:48 am
In response to ano, I just wanted to remark that I find Richard's CL article out of character: not his act of 'fairplay' in linking here to my rather critical blogpost. Previously I found him to be reasonable and thoughtful in his comments, and attributed any vituperativeness in his articles to his coauthor Steve Farmer. But in the CL article there was no coauthor. Anyway, I have nothing to add here to what is on my blog, but the comments I got were constructive and hopefully some good will yet emerge of this.
Rajesh Rao said,
September 18, 2010 @ 6:28 pm
As Heather W. indicated above, we have written a response to Richard's article, which will appear in Computational Linguistics 36(4). You can read the preprint here (the PDF copy is available here). Lee et al.'s response and Richard's reply will also appear in the same issue of the journal.
With respect to Richard's question above regarding the difference between the original bigram entropy plot and the more recent block entropy plot comparing Fortran to various languages, the difference is attributable to the method used to calculate entropy. The original plot utilized Kneser-Ney smoothing while the block entropy plot was based on a more sophisticated Bayesian entropy estimator (Nemenman et al., 2002).
As I mention in a different blog, I regard our work as only an initial step in the general research area of exploring statistical measures for characterizing linguistic and nonlinguistic systems. It would be interesting to see how block entropy compares with other statistical measures. Richard mentions in that blog that he has written a grant application to study such questions using large corpora. I think this is a research direction well worth pursuing.
Richard Sproat said,
September 18, 2010 @ 8:14 pm
Thanks Rajesh. I am pleased to end this discussion on a consensus on at least one issue.
Max Bane said,
September 18, 2010 @ 9:41 pm
On the topic of peer review, and how it might break, this article recently posted to arxiv might be of interest:
http://arxiv.org/abs/1008.4324v1
ano said,
September 19, 2010 @ 4:16 am
@Rahul: I guess the display for fairplay is rare for debates about this topic and in Indology, not rare for Richard Sproat. :-) Didn't mean to make a personal attack.
To me (an outsider to linguistics, but not to statistics), the fact that both unigram and bigram characteristics turn out reasonably as you expect for language is some evidence. As a crude analogy (not to suggest that the evidence is so weak, but just an analogy) suppose you had some long text in strange writing, which you suspected was English written in another script. It follows from elementary Bayesian probability that if you counted the number of symbols and found there were about 26 "letters", the probability you attach to its being English has increased (slightly) since before you counted — even though there are zillions of other things that may turn out to have 26 symbols, even many human languages other than English. To claim that this work hasn't changed anything and we knew it already is incorrect.
Jason Eisner said,
September 19, 2010 @ 7:06 am
This isn't the first time that computational linguists have objected to a paper by "outsiders" published in a prestigious "outside" forum. In January 2002, Physical Review letters published a paper on using gzip compression ratios to solve problems like text categorization, and it got some press. The next month, the computational linguist Joshua Goodman posted to arXiv a response with the following abstract:
Physical Review Letters declined to publish his comment, but it is worth reading on arXiv as another data point.
(In the interest of fairness, I note that the authors responded. There seems to be a bunch of further commentary here, including an accusation that the authors' response was experimentally incompetent.)
Rahul Siddharthan said,
September 20, 2010 @ 11:02 am
ano – nice analogy!
Jason – xkcd comes to mind. But physicists aren't always wrong…
Mark F. said,
September 23, 2010 @ 11:00 pm
myl – Your response has made me realize I was essentially straw-manning. I've read lots of news articles about people calling for the "end of peer review." I always thought that was crazy, but basically I was just refusing to understand the implied "as we know it."