Norvig channels Shannon contra Chomsky

« previous post | next post »

According to Stephen Cass, "Unthinking Machines", Technology Review 5/4/2011:

Some of the founders and leading lights in the fields of artificial intelligence and cognitive science gave a harsh assessment last night of the lack of progress in AI over the last few decades.

During a panel discussion—moderated by linguist and cognitive scientist Steven Pinker—that kicked off MIT's Brains, Minds, and Machines symposium, panelists called for a return to the style of research that marked the early years of the field, one driven more by curiosity rather than narrow applications.

The panelists were Marvin Minsky, Patrick Winston, Emilio Bizzi, Noam Chomsky, Barbara Partee, and Sydney Brenner. Based on Cass's short summaries, it sounds like an interesting discussion. I hope that recordings and/or transcripts will be available at some point — all that I've found so far is the symposium's advertisement on the MIT150 web site,  another write-up at MIT News, and a few other notes here and there. (Video for one of the other MIT150 symposiums is available here, so perhaps this will appear in time.)

But Cass's brief sketch of what Chomsky said was enough to provoke a lengthy and interesting response from Peter Norvig: "On Chomsky and the Two Cultures of Statistical Learning".

Norvig quotes Cass quoting Chomsky:

Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. "That's a notion of [scientific] success that's very novel. I don't know of anything like it in the history of science," said Chomsky.

Norvig's essay "discusses what Chomsky said, speculates on what he might have meant, and tries to determine the truth and importance of his claims". If you're at all interested in these issues, I strongly recommend that you read the whole thing.

Among things, Norvig discusses the relationship between science and engineering, and considers Claude Shannon's approach to language modeling as a counterpoint to Chomsky's approach.  I wrote about both of these topics in my obituary for Fred Jelinek in Computational Linguistics. After describing some of the historical background as Fred lived (and created) it, I close with the following point:

Independent of their value in practical applications, the algorithms developed by [the application of information theory to speech and language technology] offer marvelous new tools for scientists. Applying these tools to the vast stores of digital speech and text now becoming available, we can observe linguistic patterns in space, time, and cultural context, on a scale many orders of magnitude greater than in the past, and simultaneously in much greater detail.

Rather than evoking the impact of particle accelerators, as the ALPAC report did, it may be more appropriate to compare these tools to the invention of the microscope and telescope in the 17th century: Everywhere we look, there are interesting patterns previously unseen.

In the published version of his 2009 ACL Lifetime Achievement Award speech, Fred Jelinek wrote (page 484):

I sat in [on Noam Chomsky’s lectures in 1961] and got the crazy notion that I should switch from Information Theory to Linguistics. I went so far as to explore this notion with Professor Chomsky. Of course, word got around to my adviser Fano, whom it really upset. He declared that I could contemplate switching only after I had received my doctorate in Information Theory. I had no choice other than to obey. Soon thereafter, my thesis almost finished, I started interviewing at universities for a faculty position. After my job talk at Cornell I was approached by the eminent linguist Charles Hockett, who said that he hoped that I would accept the Cornell offer and help develop his ideas on how to apply Information Theory to Linguistics. That decided me. Surprisingly, when I took up my post in the fall of 1962, there was no sign of Hockett. After several months I summoned my courage and went to ask him when he wanted to start working with me. He answered that he was no longer interested, that he now concentrated on composing operas.

Discouraged a second time, I devoted the next ten years to Information Theory.

Perhaps, in the end, this was the best way for Fred to contribute to linguistics. He lived to see the triumph of his ideas in speech and language engineering; we should remember him as we explore the world that his ideas are opening up to us in speech and language science.

On the basis of my conversations with Noam Chomsky when I was a student, and my attention to his writings since then, I'm certain that he continues to disagree strongly with these ideas, and I believe that Peter Norvig's extrapolation of Chomsky's views from the brief quote in Cass's Tech Review article is mostly accurate. In Chomsky's keynote address at NELS 2010, he made similar arguments about (what he feels is) the unscientific nature of statistical modeling as a method, and of observational prediction as a goal, again using the bee-corpus analogy.

I recorded Chomsky's NELS2010 talk on my cell phone, and when I have the time, I'll post some transcribed excerpts as a contribution to clarifying his side of the argument. One curious thing that will emerge is that Chomsky and Norvig use remarkably similar rhetoric about scientific abstraction on opposite sides of the argument about the scientific interest and relevance of statistical modeling.

I should also note that the other MIT150 panelists dealt with other — interesting-sounding — issues. For example, the Tech Review article quotes Barbara Partee as saying that "Really knowing semantics is a prerequisite for anything to be called intelligence". I hope that she'll post her talk, or a link to it.  (This is a more controversial position than you might think. For example, at the recent meeting concluding five years of DARPA's GALE program, there was a fascinating discussion of whether knowing semantics, in some sense, is essential for improving Machine Translation, with strongly-expressed beliefs on both sides.)

The basic premise of the MIT150 symposium is a fascinating one:

"You might wonder why aren't there any robots that you can send in to fix the Japanese reactors," said Marvin Minsky, who pioneered neural networks in the 1950s and went on to make significant early advances in AI and robotics. "The answer is that there was a lot of progress in the 1960s and 1970s. Then something went wrong. [Today] you'll find students excited over robots that play basketball or soccer or dance or make funny faces at you. [But] they're not making them smarter."

This is exactly the opposite of the general view among people working in related fields these days. Most of them subscribe to the belief that the "classical AI" of the 1960s and 1970s, based on the idea that intelligence is applied logic, led into an impassable swamp; and that practical progress resumed in the 1980s with a turn towards the idea that intelligence is applied statistics. As someone who likes to watch these cultural pendulums swing back and forth, I was of course intrigued, and would love to read or hear more about Minsky's construal of this story.

Speaking of such intellectual pendulums, I also thought it was sociologically interesting that the Tech Review author tagged Minsky as having "pioneered neural networks in the 1950s".  This is true as far as it goes — Minsky did construct an analog neurocomputer in 1951, the Snark, as a test of Hebbian learning theory, and his 1954 doctoral dissertation was entitled "Neural Nets and the Brain Model Problem". But his most influential contribution in this area was a strongly negative one, described this way by Robert Hecht-Nielson (Neurocomputing, p. 16):

The final episode of this era was a campaign led by Marvin Minsky and Seymour Papert to discredit neural network research and divert neural network research funding to the field of "artificial intelligence"….The campaign was waged by means of personal persuasion by Minsky and Papert and their allies, as well as by limited circulation of an unpublished technical manuscript (which was later de-venomized and, after further refinement and expansion, published in 1969 by Minsky and Papert as the book Perceptrons).

This work effectively pushed "neural networks" into the intellectual margins until the 1980s, with the turn in the other direction conveniently marked by the 1986 publication of the Parallel Distributed Processing volumes.

Back at the MIT150 symposium, Patrick  Winston

… speculated that the magic ingredient that makes humans unique is our ability to create and understand stories using the faculties that support language: "Once you have stories, you have the kind of creativity that makes the species different to any other."

Again, I'd like to learn more about his take on this idea. My own immediate reaction, uninformed by any knowledge of what Winston really meant, is to quote Jake at the end of The Sun Also Rises: "Isn't it pretty to think so?"

As these are topics where the light to heat ratio is not always as high as one might like, and where the correlation between knowledge and passion is (weakly) negative, I'll leave comments closed. If you have something to contribute, send it along by email.

Update — Barbara Partee writes:

You've nudged me out of procrastinating on updating my website a smidgen – I've now posted my own 10-minute remarks for that panel.

By the way, I'm pretty sure Peter Norvig was in attendance at that panel — he spoke in two of the other panels himself, and although I didn't know him by sight until I heard him in his panels, he's striking looking (and was wearing bright Hawaiian-style shirts) and I think I did notice him there that first evening as well. So I think he heard all of Chomsky's remarks and didn't have to just extrapolate from what was reported.

Update #2 — from Mike Travers:

I'm a former student of Minsky's, and since I don't really work in the field any more I feel like I can give a pretty unemotional representation of his views. Minsky and Chomsky, for the purposes of this discussion, are both cognitivists or structuralists — they are interested in the structure, rules, and dynamics of the mind.  Statistics from this perspective can be a useful tool but cannot be the center of the universe.  Statistics without structure is meaningless, it can only be a means that the mind uses to derive structure from experience.

It is a mistake to identify Minsky with logicism.  From his perspective, formal logic is just as much a deadly intellectual sidetrack as robotics.  A better term would be "symbolist", although he might quarrel with that label as well.  Minsky does think about the symbolic structures of the brain, but he does not think of them as a logician would (eg, requiring that they be consistent).  A good summary of his views from 1990 is here.

Re stories:  how about if we call them "case-based, experientially-rooted sequentially-organized semantic structures"?  Would that make the idea that the mind is rooted in narrative seem less pretty and more true?  As someone who has pushed this idea forward a bit, I view it as another attempt to navigate past the sandtrap of logicism.  Logic views the mind as made of facts, narrative theory views it as made up of stories we continually tell ourselves. The latter does sound pretty, but it also sounds more convincing and realistic, at least to some of us.  For more on this viewpoint, Google "narrative intelligence" or look at the work of case-based reasoning from Roger Schank and students.

I'm certainly familiar with Roger Schank's work, and with the work of many of his students, and again, my impression was that his approach to case-based reasoning — "scripts" and all that — pretty well petered out in the 1980s.  The Wikipedia article on him says that "Schank and his students at Yale initially applied these ideas to the problem of computer recognition of English (called natural language understanding) in the late 1970s and early 1980s, but progress eventually stalled and those methods fell into disuse." This corresponds to my experience of activity in the field.

So if narrative analysis is the future of AI, then again, it's that intellectual pendulum swinging on a 20- or 30-year cycle.

I should add that I don't assume that "unfashionable" means "wrong".  Being a phonetician, I was a sympathizer with the "cybernetic underground" during the wilderness years of the 1970s and 1980s, and I was puzzled by the belief (widespread at the time in the MIT AI lab as well as the MIT linguistics department) that counting higher than one was a sign of error.  Being a linguist, I similarly sympathize with today's symbolist underground(s), and I'm similarly puzzled by the view that semantic or narrative analysis is a snare and a delusion.

It would be nice to see some serious new work on understanding as narrative analysis. But Schankian scripts as a model for this revival of symbolic AI? Color me skeptical.

Update #3 — James McDermott writes:

Your readers might enjoy reading and commenting here.

One of the long comments there is mine (I'm jmmcd on reddit). There's even a mention of Language Log (I say it's an example of linguists interested in usage, someone else isn't impressed).

Update #4 — Roger Schank writes:

If you wanted to find out about my work, you might have looked at my site and not wikipedia, where you would have a description of our reminding machine which is getting a lot of interest from organizations with a need to capture their corporate memory.

And if you like wikipedia, you will find that CBR is alive and well (and has nothing to do with scripts so clearly you hadn't been paying attention prior to dismissing it).

Update #5 — Bill Benzon writes:

I've attached Martin Kay's 2005 acceptance speech for the ACL Lifetime Achievement award. His concluding remarks play well in the statistics vs.'structure' squabble. Start with the last paragraph on page 437:

Now I come to the fourth point, which is ambiguity. This, I take it, is where statistics really come into their own. Symbolic language processing is highly nondeterministic and often delivers large numbers of alternative results because it has no means of resolving the ambiguities that characterize ordinary language. This is for the clear and obvious reason that the resolution of ambiguities is not a linguistic matter. After a responsible job has been done of linguistic analysis, what remain are questions about the world. They are questions of what would be a reasonable thing to say under the given circumstances, what it would be reasonable to believe, suspect, fear or desire in the given situation. If these questions are in the purview of any academic discipline, it is presumably artificial intelligence. But artificial intelligence has a lot on its plate and to attempt to fill the void that it leaves open, in whatever way comes to hand, is entirely reasonable and proper. But it is important to understand what we are doing when we do this and to calibrate our expectations accordingly. What we are doing is to allow statistics over words that occur very close to one another in a string to stand in for the world construed widely, so as to include myths, and beliefs, and cultures, and truths and lies and so forth. As a stop-gap for the time being, this may be as good as we can do, but we should clearly have only the most limited expectations of it because, for the purpose it is intended to serve, it is clearly pathetically inadequate. The statistics are standing in for a vast number of things for which we have no computer model. They are therefore what I call an “ignorance model”.

Then:

So, just a couple of final reflections. Statistical NLP has opened the road to applications, funding and respectability for our field. I wish it well. I think it is a great enterprise, despite what I may have seemed to say to the contrary.

Language, however, remains a valid and respectable object of study and I earnestly hope that the ACL will continue to pursue it.

My own perspective on these things is that the resources and methods of statistical speech and language processing, rather than being some sort of alternative or competitor or replacement for the scientific study of speech, language, and communication, instead give us wonderful new tools for doing science in this area.

Update — more here.



Comments are closed.