If you followed my advice (in "Norvig channels Shannon contra Chomsky", 5/31/2011) and read all of Peter Norvig's essay "On Chomsky and the Two Cultures of Statistical Learning", you may have detected a certain restrained testiness in Norvig's response. The goal of this post is to give a bit of explanatory background, and to suggest why, on the whole, I share Norvig's reaction.
Here's a short passage from Noam Chomsky's invited lecture at NELS 41, 10/4/2010. (Apologies for the poor audio quality — this is a recording that I made on my cell phone. Disfluencies have been edited out of this fragment — a rough but more complete transcript of the entire lecture is here.)
The argument is well, take a look at physics — you know, real science. It's based on observations. But you know, not observations of things in their natural state, like you don't try to say determine the laws of motion by taking video tapes of leaves falling and do massive statistical analysis of them, and so on and so forth. So you do experiments, and in fact a lot of the experiments are thought experiments, including Galileo's classic experiments,and that goes right up to the present. But any experiment ((is)) a high level abstraction, and theory-internal, as everybody who does experimental work knows. [It's the] same ((if you're) studying any other topic, say bee communication. I mean again, ((if you)) take a look at the work on bee science, it involves highly contrived, very intricate experiments that are radically abstracted from natural conditions. Nobody suggests studying bee communication, again, by taking a massive corpus, you know, [a] huge library of video tapes of bees swarming around and doing statistical analysis of it, and getting some prediction about what they're likely to do next.
There are two arguments yoked together here. One argument has to do with the goals of science: Chomsky pushes explanation over prediction, and never mind prediction's occasional intrinsic value (in the case of climate change or asteroid strikes or inflation rates), and its generic value as a check of a theory's correctness. (C dislikes prediction, I think, because he associates it with "statistical language models", which in his long-held view are incapable of describing syntactic structure, much less explaining it.)
The other argument has to do with the methods of science: Chomsky argues for "very intricate experiments that are radically abstracted from natural conditions". His disdain for mere description ("butterfly collecting", as he often calls it), and especially for "observations of things in their natural state", is well known. You can see it in the short passage quoted above, and if you read the rest of the transcript of his NELS 41 lecture, or his other works such as "Linguistics and Brain Science", you'll see that it's a recurring theme.
Let me start by saying that there's a way to take all this that makes it entirely correct. The key motive of science is explanation, and it's often essential to abstract away from the complexities of raw observation, and so on. I took courses from Chomsky as an undergraduate and a graduate student, and I'm grateful for what I learned from him, and for the eminently fair way that he always treated me. But increasingly, it seems to me, he has been elevating his personal distaste for the complexities of the real world into a systematic philosophy. To the extent that others accept these views, it excludes them from participation in (what I think are) the most promising and exciting current directions in the sciences of speech and language.
I'll let someone else address the physics piece of this, perhaps by considering the work that won the 2011 Gruber Cosmology Prize, a collaboration that started with what you could call a "massive corpus of galaxies swarming around":
The particular evidence that motivated the creation of the DEFW collaboration came in the form of a 1981 Harvard-Smithsonian Center for Astrophysics survey of 2400 galaxies at various distances—at the time, an extraordinary census of how the heavens look on the largest scales. (Davis led the project.) What the CfA survey showed was an early hint of what is today called “the cosmic web”—galaxies grouped into lengthy filaments, or superclusters, separated by vast voids.
But "bee science" is something that I know a little bit about, especially the part of it that has to do with bee communication. The father of modern "bee science" was Karl von Frisch, who got the Nobel Prize in 1973 for his work in this area. And von Frisch was no enemy of observing "things in their natural state", and no friend of "highly contrived … experiments that are radically abstracted from natural conditions". In the preface to his popular work Aus den Leben der Bienen, published in an English translation by Dora Ilse as The Dancing Bees in 1954, he wrote:
If we use excessively elaborate apparatus to examine simple natural phenomena Nature herself may escape us. This is what happened some forty-five years ago when a distinguished scientist, studying the colour sense of animals in his laboratory, arrived at the definite and apparently well-established conclusion that bees were colour-blind. It was this occasion which first caused me to embark on a close study of their way of life; for once one got to know, through work in the field, something about the reaction of bees to the brilliant colour of flowers, it was easier to believe that a scientist had come to false conclusion than that nature had made an absurd mistake.
Here's a photograph of one of von Frisch's own laboratories, in his native Austria, shown in Figure 49 from his 1950 published lecture series Bees: Their Vision, Chemical Senses, and Language, Cornell University Press:
And here's an (summarized) example of the sort of data that he collected in these laboratories:
It's true that he didn't use "a massive library of video tapes of bees swarming around". This was partly because he did his research before video tapes were invented, and partly because video tapes wouldn't in any case have been an effective way to keep track of bee flights over thousands of meters of mountain meadows.
Here's an example of the sort of detailed track of bee flights in the wild that bee scientists made in the middle of the 20th century. This comes from Martin Lindauer, Communication Among Social Bees, Harvard University Press, 1961. Lindauer was von Frisch's student, and the monograph in question is the published version of the Prather Lectures in Biology, given at Harvard in 1959:
And here's Lindauer's detailed summary of the 25-day life of one particular bee — it's compiled from a series of day-by-day observation sheets that record in exquisite detail how much time the animal spent on each task, and when:
Observations like these were not compiled for their own sake, of course, but because bee scientists wanted to understand how bees navigate, how the division of labor in the hive is determined, and so on. For them, there was no radical divorce between "massive libraries" of observations and scientific insight — the evolving explanations motivated the observations, which both motivated and informed the explanations.
And to the extent that bee science had problems in the middle of the 20th century, it was lack of adequate data — their libraries of observations were not nearly massive enough. These painstakingly detailed observations of bee behavior, in settings ranging from completely natural through partly artificial to fully artificial, were simply too sparse to allow a wide enough range of theories to be evaluated and compared.
Even in 2003, Jennifer Fewell wrote ("Social Insect Networks", Science 301(5641):2867-1870) that
With these global attributes in place, how does information transfer within a social colony actually occur? Unfortunately, we do not yet have enough empirical data to answer this question well.
And she concludes:
What should be done next in the exploration of social groups as networks? We need to expand our models from elegant descriptions of single behaviors to incorporate the more complex dynamics of the group as a whole. We also need to test those models empirically on a wider range of social systems. Finally, to understand the evolutionary significance of network dynamics, we must explicitly measure their fitness effects on the social group. This interplay between network dynamics and selection is just beginning to be explored, and social insects have the potential to be on the leading edge.
A variety of new instrumentation techniques are starting to make it possible to gather more and better bee-science data more cheaply and conveniently. Thus according to J.R. Riley et al., "The flight paths of honeybees recruited by the waggle dance", Nature 2005:
In the ‘dance language’ of honeybees, the dancer generates a specific, coded message that describes the direction and distance from the hive of a new food source, and this message is displaced in both space and time from the dancer’s discovery of that source. Karl von Frisch concluded that bees ‘recruited’ by this dance used the information encoded in it to guide them directly to the remote food source, and this Nobel Prize-winning discovery revealed the most sophisticated example of non-primate communication that we know of. In spite of some initial scepticism, almost all biologists are now convinced that von Frisch was correct, but what has hitherto been lacking is a quantitative description of how effectively recruits translate the code in the dance into flight to their destinations. Using harmonic radar to record the actual flight paths of recruited bees, we now provide that description.
Here's a figure from that paper, derived from statistical analysis of the recorded flight-path data:
This is not yet "massive statistical analysis": there were only 17 foragers tracked, and they were treated entirely as individuals, except with respect to their interactions with the dancing foragers who recruited them. But this is mostly because the scientists had to glue a little radar transponder to every bee studied in this way. If bee scientists could easily get tracks of thousands or millions of foragers, co-indexed with the various interactions of these individuals within the hive, I strongly suspect that they'd be very happy indeed.
And it's exactly that kind of information — about human speech and language use — that today's "massive corpora" of diverse digital data streams offer to speech and language scientists. No matter how great and even well-deserved Noam Chomsky's celebrity might be, I doubt that his distaste for "massive statistical analysis" of the complexities of "things in their natural state" will be able to keep speech and language scientists from taking advantage of this opportunity.
[With some trepidation, I'm going to leave comments open. A warning in advance: unsupported and otherwise content-free expressions of opinion will be deleted, as will rant-like displays on any side of the various issues involved.]